boost::binary_iarchive instead of boost::text_iarchive from SQLite3 Blob - c++

I'm not an expert on streams and buffers, though I have learned an immense amount over the last month since I started tackling this problem.
This concerns boost::serialization and if I can get the binary archiving working, I'll save 50% of the storage space.
I've searched all over StackOverflow for the answer, and I've pieced together the following code that works, but only for text_iarchive. If I try to move to binary_iarchive, I get a segment fault with a message of "boost serialize allocate(size_t n) 'n' exceeds maximum supported size" or any other number of errors where it is obvious that there is a disconnect between the input stream/buffer and what binary_iarchive is expecting.
Like I said earlier, this works perfectly with text_iarchive. I can text_oarchive to an SQLite3 Blob, verify it in the database, retrieve it, text_iarchive it back in to the complex object, and it works perfectly.
Is there something wrong with the way I set up the input stream and buffer?
To not confuse everyone, I am NOT posting the structure of the object I am serializing and deserializing. There are many vector<double>, an Eigen Matrix, and a couple of basic objects. They work perfectly and are not part of the problem! (And yes, I delete the database records between tests to guard against reading a text_oarchive into a binary_iarchive.)
Here is the output archive section. This appears to work perfectly for text_oarchive OR binary_oarchive. The Blob shows up in the database and appears to be of the proper binary structure.
// BinaryData is a Typedef for std::vector<char>
BinaryData serializedDataStream;
bio::stream<bio::back_insert_device<BinaryData>> outbuf {serializedDataStream};
// I change the text_oarchive to binary_oarchive and uncomment the std::ios::binary parameter.
// when I'm attempting to move from text to binary
boost::archive::text_oarchive outStream(outbuf); //, std::ios::binary);
outStream << ssInputDataAndBestModel_->theModel_;
outbuf.flush();
// have to convert to unsigned char since that is the way sqlite3 expects to see
// a Blob object type
std::vector<unsigned char> buffer(serializedDataStream.begin(),serializedDataStream.end());
I then pass "buffer" to the SQLite3 processing object to store it in the Blob.
Here is the input archive section. The Blobs look identical storing and then retreiving from the DB whether it's text or binary. (But a text doesn't look like a binary, obviously.)
// this line is to get the blob out of SQLite3
currentModelDBRecPtr = cpp17::any_cast<dbo::ptr<Model>>(modelListModel);
if (!currentModelDBRecPtr->theModel.empty()) {
// have to convert to char since that is the way boost::serialize expects to see
// an archived object type (blob is vector of unsigned char)
std::vector<char> blobBuffer(currentModelDBRecPtr->theModel.begin(), currentModelDBRecPtr->theModel.end());
boost::iostreams::stream<boost::iostreams::array_source> membuf(blobBuffer.data(), blobBuffer.size());
std::istream &input_stream = membuf;
// Note: I change the following to binary_iarchive and uncomment the
// std::ios::binary flag to try to move from text_iarchive to binary_iarchive
boost::archive::text_iarchive input_archive(input_stream); //, std::ios::binary);
TheModel inputArchiveModel;
// it crashes on the next line, but it DOES successfully recreate half
// of the object before it randomly crashes.
input_archive >> inputArchiveModel;
}

Related

Issue with boost serialization not dearchiving when data length is over a specific value

Okay, so I am trying to send a struct with boost asio. The send on the client-side works fine and the read_until also seems fine. However, when it tries to deserialize the data back to the struct it won't work when the size of the archive is greater than about 475 in length. The rest of the struct gets ignored for some reason and only the data field gets printed. I also added screenshots of the output. Basically, when the whole struct is not received there is an input stream error on the line ba >> frame. I also tested both with a larger file and get the same error. I even tried serializing a vector as well so not sure where my error is.
EDIT:
I figured out the issue. When I was reading from the socket I had something like this...
boost::asio::read_until(socket, buf, "\0");
This was causing weird issues reading in all the data from the boost binary archive. To fix this issue I made a custom delimiter that I appended to the archive I was sending over the socket like...
boost::asio::read_until(socket, buf, "StopReadingHere");
This fixed the weird issue of the entire boost archive string not being read into the streambuf.
First Issue
ostringstream oss;
boost::archive::text_oarchive ba(oss);
ba << frame;
string archived_data = oss.str();
Here you take the string without ensuring that the archive is complete. Fix:
ostringstream oss;
{
boost::archive::text_oarchive ba(oss);
ba << frame;
}
string archived_data = oss.str();
Second issue:
boost::asio::read_until(socket, buf, "\0");
string s((istreambuf_iterator<char>(&buf)), istreambuf_iterator<char>());
Here you potentially read too much into s - buf may contain additional data after the '\0'. Use the return value from read_until and e.g. std::copy_n, following buf.consume(n).
If you then keep the buf instance for subsequent reads you will still have the previously read remaining data in the the buffer. If you discard it, that will lead to problems deserializing the next message.
Risky Code?
void write(tcp::socket& socket, string data, int timeout) {
auto time = std::chrono::seconds(timeout);
async_write(socket, boost::asio::buffer(data), transfer_all(), [&] (error_code error, size_t bytes_transferred) {
});
service.await_operation(time, socket);
}
You're using async operation, but passing local variables (data) as buffer.The risk is that data becomes invalid as soon as write returns.
Are you making sure that async_write is always completed before exiting from write? (It is possible that await_operation achieves this for you.
Perhaps you are even using await_operation from my own old answer here How to simulate boost::asio::write with a timeout . It's possible since things were added that some assumptions no longer hold. I can always review some larger piece of code to check.

C++ Read specific parts of a file with start and endpoint

I am serializing multiple objects and want to save the given Strings to a file. The structure is the following:
A few string and long attributes, then a variable amount of maps<long, map<string, variant> >. My first idea was creating one valid JSONFile but this is very hard to do (all of the maps are very big and my temporary memory is not big enough). Since I cant serialize everything together I have to do it piece by piece. I am planning on doing that and I then want to save the recieved strings to a file. Here is how it will look like:
{ "Name": "StackOverflow"}
{"map1": //map here}
{"map2": //map here}
As you can see this is not one valid JSON object but 3 valid JSONObjects in one file. Now I want to deserialize and I need to give a valid JSONObject to the deserializer. I already save tellp() everytime when I write a new JSONObject to file, so in this example I would have the following adresses saved: 26, endofmap1, endofmap2.
Here is what I want to do: I want to use these addresses, to extract the strings from the file I wrote to. I need one string which is from 0 to (26-1), one string from 26 to(endofmap1-1) and one string from endofmap1 to (endofmap2-1). Since these strings would be valid JSONObjects i could deserialize them without problem.
How can I do this?
I would create a serialize and deserialize class that you can use as part of a hierarchy.
So for instance, in rough C++ psuedo-code:
class Object : public serialize, deserialize {
public:
int a;
float b;
Compound c;
bool serialize(fstream& fs) {
fs << a;
fs << b;
c->serialize(fs);
fs->flush();
}
// same for deserialize
};
class Compound : serialize, deserialize {
public:
map<> things;
bool serialize(fstream& fs) {
for(thing : things) {
fs << thing;
}
fs->flush();
}
};
With this you can use JSON as the file will be written as your walk the heirarchy.
Update:
To extract a specific string from a file you can use something like this:
// pass in an open stream (streams are good for unit testing!)
std::string extractString(fstream& fs) {
int location = /* the location of the start from file */;
int length = /* length of the string you want to extract */;
std::string str;
str.resize(length);
char* begin = *str.begin();
fs->seekp(location);
fs->read(begin, length);
return str;
}
Based on you saying "my temporary memory is not big enough", I'm going to assume two possibilities (though some kind of code example may help us help you!).
possibility one, the file is too big
The issue you would be facing here isn't a new one - a file too large for memory, assuming your algorithm isn't buffering all the data, and your stack can handle the recursion of course.
On windows you can use the MapViewOfFile function, the MSDN has plenty of detail on that. This function will effectively grab a "view" of a section of a file - allowing you to load enough of the file to modify only what you need, before closing and opening a view at a later offset.
If you are on a different platform, there will be similar functions.
possibility two, you are doing too much at once
The other option is more of a "software engineering" issue. You have so much data then when holding them in your std::maps, you run out of heap-memory.
If this is the case, you are going to need to use some clever thinking - here are some ideas!
Don't load all your data into the maps. wherever the data is coming from, take a CRC, Index, or Filename of the data-source. Store that information in the map, and leave the actual "big strings" on the hard disk. - This way you can load each item of data when you need it.
This works really well for data that needs to be sorted, or correlated.
Process or load your data when you need to write it. If you don't need to sort or correlate the data, why load it into a map beforehand at all? Just load each "big string" of data in sequence, then write them to the file with an ofstream.

Saving other files in a own data store with fstream

I am developing at the time a small Filestoresystem, which should store
some files like .png´s and so in it.
,
So I read the bytes from the .png in a char vector successfully, the size of the vector is the same size as the picture (it should be OK).
Then, I wanted to save the bytes in another .png.
Actually, I created the File succesfully, but the File is completely empty.
Here is the most important code, I guess:
void storedFile::saveData(char Path[]){
std::fstream file;
file.open(Path,std::ios::trunc|std::ios::out|std::ios::binary);
if(!file.is_open())
std::cout << "Couldn´t open saved File (In Func saveData())" << std::endl;
file.write((char*)&Data,sizeof(char) * Data.size());
file.close();}
I think that I did it right, but it's not working.
Again, the bytes of the .png are stored in Data.
I tested after every opening and reading, if it opened and so on, everything worked fine (no error codes appeared).
This part looks strange:
file.write((char*)&Data,sizeof(char) * Data.size());
^^^^^^^^^^^^
Data.size() is a hint that data is a std::vector, so &Data is actually wrong, it should be (char*)Data.data()

Protocol Buffers; saving data to disk & loading back issue

I have an issue with storing Protobuf data to disk.
The application i have uses Protocol Buffer to transfer data over a socket (which works fine), but when i try to store the data to disk it fails.
Actually, saving data reports no issues, but i cannot seem to load them again properly.
Any tips would be gladly appreciated.
void writeToDisk(DataList & dList)
{
// open streams
int fd = open("serializedMessage.pb", O_WRONLY | O_CREAT);
google::protobuf::io::ZeroCopyOutputStream* fileOutput = new google::protobuf::io::FileOutputStream(fd);
google::protobuf::io::CodedOutputStream* codedOutput = new google::protobuf::io::CodedOutputStream(fileOutput);
// save data
codedOutput->WriteLittleEndian32(PROTOBUF_MESSAGE_ID_NUMBER); // store with message id
codedOutput->WriteLittleEndian32(dList.ByteSize()); // the size of the data i will serialize
dList.SerializeToCodedStream(codedOutput); // serialize the data
// close streams
delete codedOutput;
delete fileOutput;
close(fd);
}
I've verified the data inside this function, the dList contains the data i expect. The streams report that no errors occur, and that a reasonable amount of bytes were written to disk. (also the file is of reasonable size)
But when i try to read back the data, it does not work. Moreover, what is really strange, is that if i append more data to this file, i can read the first messages (but not the one at the end).
void readDataFromFile()
{
// open streams
int fd = open("serializedMessage.pb", O_RDONLY);
google::protobuf::io::ZeroCopyInputStream* fileinput = new google::protobuf::io::FileInputStream(fd);
google::protobuf::io::CodedInputStream* codedinput = new google::protobuf::io::CodedInputStream(fileinput);
// read back
uint32_t sizeToRead = 0, magicNumber = 0;
string parsedStr = "";
codedinput->ReadLittleEndian32(&magicNumber); // the message id-number i expect
codedinput->ReadLittleEndian32(&sizeToRead); // the reported data size, also what i expect
codedinput->ReadString(&parsedstr, sizeToRead)) // the size() of 'parsedstr' is much less than it should (sizeToRead)
DataList dl = DataList();
if (dl.ParseFromString(parsedstr)) // fails
{
// work with data if all okay
}
// close streams
delete codedinput;
delete fileinput;
close(fd);
}
Obviously i have omitted some of the code here to simplify everything.
As a side note i have also also tried to serialize the message to a string & save that string via CodedOutputStream. This does not work either. I have verified the contents of that string though, so i guess culprit must be the stream functions.
This is a windows environment, c++ with protocol buffers and Qt.
Thank you for your time!
I solved this issue by switching from file descriptors to fstream, and FileCopyStream to OstreamOutputStream.
Although i've seen examples using the former, it didn't work for me.
I found a nice code example in hidden in the google coded_stream header. link #1
Also, since i needed to serialize multiple messages to the same file using protocol buffers, this link was enlightening. link #2
For some reason, the output file is not 'complete' until i actually desctruct the stream objects.
The read failure was because the file was not opened for reading with O_BINARY - change file opening to this and it works:
int fd = open("serializedMessage.pb", O_RDONLY | O_BINARY);
The root cause is the same as here: "read() only reads a few bytes from file". You were very likely following an example in the protobuf documentation which opens the file in the same way, but it stops parsing on Windows when it hits a special character in the file.
Also, in more recent versions of the library, you can use protobuf::util::ParseDelimitedFromCodedStream to simplify reading size+payload pairs.
... the question may be ancient, but the issue still exists and this answer is almost certainly the fix to the original problem.
try to use
codedinput->readRawBytes insead of ReadString
and
dl.ParseFromArray instead of ParseFromString
Not very familiar with protocol buffers but ReadString might only read a field of type strine.

How to get boost::iostream to operate in a mode comparable to std::ios::binary?

I have the following question on boost::iostreams. If someone is familiar with writing filters, I would actually appreciate your advices / help.
I am writing a pair of multichar filters, that work with boost::iostream::filtering_stream as data compressor and decompressor.
I started from writing a compressor, picked up some algorithm from lz-family and now am working on a decompressor.
In a couple of words, my compressor splits data into packets, which are encoded separately and then flushed to my file.
When I have to restore data from my file (in programming terms, receive a read(byte_count) request), I have to read a full packed block, bufferize it, unpack it and only then give the requested number of bytes. I've implemented this logic, but right now I'm struggling with the following problem:
When my data is packed, any symbols can appear in the output file. And I have troubles when reading file, which contains symbol (hex 1A, char 26) using boost::iostreams::read(...., size).
If I was using std::ifstream, for example, I would have set a std::ios::binary mode and then this symbol could be read simply.
Any way to achieve the same when implementing a boost::iostream filter which uses boost::iostream::read routine to read char sequence?
Some code here:
// Compression
// -----------
filtering_ostream out;
out.push(my_compressor());
out.push(file_sink("file.out"));
// Compress the 'file.in' to 'file.out'
std::ifstream stream("file.in");
out << stream.rdbuf();
// Decompression
// -------------
filtering_istream in;
in.push(my_decompressor());
in.push(file_source("file.out"));
std::string res;
while (in) {
std::string t;
// My decompressor wants to retrieve the full block from input (say, 4096 bytes)
// but instead retrieves 150 bytes because meets '1A' char in the char sequence
// That obviously happens because file should be read as a binary one, but
// how do I state that?
std::getline(in, t); // <--------- The error happens here
res += t;
}
Short answer for reading file as binary :
specify ios_base::binary when opening file stream.
MSDN Link