How to Unserialize File From Scratch (without library) In C++ - c++

I was given a file with data stored in a custom format, for example, "data.asd", and tasked with extracting the information out of it. I was given a file specification of the ".asd" format.
All asd files begin at 0x0 which starts with the 5 bytes "Hello" and a 6th one for the \0 termination. Next 32 bits is a pointer to an entry list, which is an array of 127 entries. Each entry contains a 16 char null terminated string, a pointer to some data, and a size variable that tells the size of the data. The value 0xFFFFFFFF signifies the end of the list.
I've look into using the C++ boost serializing library, but get errors when I tried to open the file. I'm assuming boost can only read files it has wrote.
std::ifstream ifs("data.asd");
boost::archive::binary_iarchive in_arch(ifs);
I've since checked out serializing "manually" by opening in ifstream, copying the binary file into a vector, and then using memmove.
ifs.open(fileName, ios::in | ios::binary);
//copy all contents in binary into buffer
vector<char> buffer((
istreambuf_iterator<char>(ifs)),
(istreambuf_iterator<char>()));
memmove(s, &buffer.at(0), 6); // move char array 'hello' into string s
I should be able to figure out where the data, entry list, and strings end by checking for termination bits. That way I can get by using memmove and serialize the file by checking bits.
For my case, is there any better option? If i am stuck using memmove, how do I figure out what the pointers point to? Using memmove I was able to move the six bits into a string 's' and rebuild the variable, but I'm unsure how to handle the pointers.

You could memory map things and use Boost Endian.
Alternatively you could use Boost Spirit's Binary parsers: https://www.boost.org/doc/libs/1_51_0/libs/spirit/doc/html/spirit/qi/reference/binary.html
There's an example:
std::uint32_t length;
bool valid = qi::parse(first, last,
"Hello" >> qi::little_word >> char_('\0'), length);

Related

c++ Writing binary to a file

HI im trying to write to a txt file in binary.
now i wrote this code:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
char* f = "abc";
ofstream ofile("D:\\foobar.txt", ios_base::out | ios_base::binary);
ofile.write(f, sizeof(char*));
return 0;
}
now it writes "abc" but not in binary.
can someone please tell me how to write it in binary.
First of all, you write the wrong size and might go out of bounds. Use strlen instead to get the length of the string.
Secondly, think about how a letter like 'a' is stored in memory in the computer. It's is stored in whatever encoding the compiler and operating system uses, which is most likely ASCII. When you write that letter to a file it will write the value stored in memory to the file, and if you read the file using a program which is able to decipher the encoding it will show you the letter.
I'm just guessing here, but I think you expected binary format to write actual ones and zeroes as text. Well you do write ones and zeroes, not as text but as individual bits. And when all those bits are put together into bytes you get the bytes as they are stored in memory. If you look at the file in a hex-editor you will the the actual values, and you might even be able to find a program that shows you the actual binary values as ones and zeros.

divide information into bits bitset library

I am trying to get information from a file. The content of the file is unknown. I am opening the file using a fstream object and storing each piece of data into a unsigned char . The size however of a char is 8 bits. But I need to get the data into 16 bits segments. I am using the bitset library to do this.
while(file>>test2)
{
file>>test2;
bitset<16> foo(test2);
cout<<foo<<endl;
}
first of all only gives me 8 bits of information using characters, if I use another data type the programs does not outputs anything. Is there another library that breaks a data type into bits?

Most efficient to read file into separate variables using fstream

I have tons of files which look a little like:
12-3-125-BINARYDATA
What would be the most efficient way to save the 12, 3 and 125 as separate integer variables, and the BINARYDATA as a char-vector?
I'd really like to use fstream, but I don't exactly know how to (got it working with std::strings, but the BINARYDATA part was all messed up).
The most efficient method for reading data is to read many "chunks", or records into memory using the fewest I/O function calls, then parsing the data in memory.
For example, reading 5 records with one fread call is more efficient than 5 calls to fread to read in a record. Accessing memory is always faster than accessing external data such as files.
Some platforms have the ability to memory-map a file. This may be more efficient than reading the using I/O functions. Profiling will determine the most efficient.
Fixed length records are always more efficient than variable length records. Variable length records involve either reading until a fixed size is read or reading until a terminal (sentinel) value is found. For example, a text line is a variable record and must be read one byte at a time until the terminating End-Of-Line marker is found. Buffering may help in this case.
What do you mean by Binary Data? Is it a 010101000 char by char or "real" binary data? If they are real "binary data", just read the file as binary file. First read 2 bytes for the first int, next 1 bytes for -, 2 bytes for 3,and so on, until you read the first pos of binary data, just get file length and read all of it.

Reading and Huffman compressing 4-byte binary string STD C++ Linux Environment

I am working on some homework for Huffman coding. I already have the Huffman algorithm completed, but need to slightly alter it to work with binary files. I have some spent some time reading related problems, and perhaps due to my lack of understanding of data types and binary files, I am still struggling a bit, so hopefully I am not repeating a prior question (I won't be posting code related to the huffman part of the program).
Here is the key phrase: "You can assume that each symbol, which will be mapped to a codeword, is a 4-byte binary string.", and what I think I know is that Char represents one byte and unsigned int represents four byte, so I am guessing I should be reading the input four bytes at a time into a unsigned int Buffer and then collect my data for the Huffman part of the program.
int main() {
unsigned int buffer;
fstream input;
input.open("test.txt", ios::in | ios::binary);
while(input) {
input.read(reinterpret_cast<char *>(&buffer), 4);
//if buffer does not exist as unique symbol in collection of data add it
//if buffer exists update statistics of symbol
}
input.close();
}
Does this look like a good way to handle the data? How should I handle the very end of the file if there are only 1,2, or 3 bytes left? So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Edit: What's the best way to store the header of a Huffman compressed a file?
Does this look like a good way to handle the data?
Instead of casting a pointer, I would suggest using union of int and char [4] and passing pointer to the char array as you should be. Don't know what's the rest of the logic, so can't say if the actual handling (which is not in the code you posted) is done in a good way, but it seems to me rather trivial.
How should I handle the very end of the file if there are only 1,2, or 3 bytes left?
Assuming each symbol is 4 bytes long, I would expect that not be a valid input.
So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Why would you do that? In your data, a "character" is 4 bytes. But you can just use casting to array of bytes if you want (or, better, use bitwise operations to extract the actual bytes, if the order matters).

Binary file write problem in C++

This is my function which creates a binary file
void writefile()
{
ofstream myfile ("data.abc", ios::out | ios::binary);
streamoff offset = 1;
if(myfile.is_open())
{
char c='A';
myfile.write(&c, offset );
c='B';
myfile.write(&c, offset );
c='C';
myfile.write(&c,offset);
myfile.write(StartAddr,streamoff (16) );
myfile.close();
}
else
cout << "Some error" << endl ;
}
The value of StartAddr is 1000, hence the expected output file is:
A B C 1000 NUL NUL NUL
However, strangely my output file appends this: data.abc
So the final outcome is: A B C 1000 NUL NUL NUL data.abc
Please help me out with this. How to deal with this? Why is this strange behavior?
I recommend you quit with binary writing and work on writing the data in a textual format. You've already encountered some of the problems with writing data. There are still issues for you to come across about reading the data and portability. Expect more pain if you continue this route.
Use textual representations. For simplicity you can put one field per line and use std::getline to read it in. The textual representation allows you to view the data in any text editor, easily. Try using Notepad to view a binary file!
Oh, but binary data is soo much faster and takes up less space in the file. You've already wasted enough time and money than you would gain by using binary data. The speed of computers and huge memory capacities (disk and RAM) make binary representations a thing of the past (except in extreme cases).
As a learning tool, go ahead and use binary. For ease of development and quick schedules (IOW, finishing early), use textual representations.
Search Stack Overflow for "C++ micro optimization" for the justifications.
There are several issues with this code.
For starters, if you want to write individual characters t a stream, you don't need to use ostream::write. Instead, just use ostream::put, as shown here:
myfile.put('A');
Second, if you want to write out a string into a file stream, just use the stream insertion operator:
myfile << StartAddr;
This is perfectly safe, even in binary mode.
As for the particular problem you're reporting, I think that the issue is that you're trying to write out a string of length four (StartAddr), but you've told the stream to write out sixteen bytes. This means that you're writing out the four bytes for the string contents, then the null terminator, and then nine bytes of whatever happens to be in memory after the buffer. In your case, this is two more null bytes, then the meaningless text that you saw after that. To fix this, either change your code to write fewer bytes or, if StartAddr is a string, then just write it using <<.
With the line myfile.write(StartAddr,streamoff (16) ); you are instructing the myfile object to write 16 bytes to the stream starting at the address StartAddr. Imagine that StartAddr is an array of 16 bytes:
char StartAddr[16] = "1000\0\0\0data.b32\0";
myfile.write(StartAddr, sizeof(StartAddr));
Would generate the output that you see. Without seeing the declaration / definition of StartAddr I cannot say for certain, but it appears you are writing out a five byte nul terminated string "1000" followed by whatever happens to reside in the next 11 bytes after StartAddr. In this case, it appears a couple of nul bytes followed by the constant nul terminated string "data.b32" (which the compiler must put somewhere in memory) are what follow StartAddr.
Regardless, it is clear that you overread a buffer.
If you are trying to write a 16 bit integer type to a stream you have a couple of options, both based on the fact that there are typically 8 bits in a byte. The 'cleanest' one would be something like:
char x = (StartAddr & 0xFF);
myfile.write(x);
x = (StartAddr >> 8);
myfile.write(x);
This assumes StartAddr is a 16 bit integer type and does not take into account any translation that might occur (such as potential conversion of a value of 10 [a linefeed] into a carriage return / linefeed sequence).
Alternatively, you could write something like:
myfile.write(reinterpret_cast<char*>(&StartAddr), sizeof(StartAddr));