Protocol Buffers; saving data to disk & loading back issue - c++

I have an issue with storing Protobuf data to disk.
The application i have uses Protocol Buffer to transfer data over a socket (which works fine), but when i try to store the data to disk it fails.
Actually, saving data reports no issues, but i cannot seem to load them again properly.
Any tips would be gladly appreciated.
void writeToDisk(DataList & dList)
{
// open streams
int fd = open("serializedMessage.pb", O_WRONLY | O_CREAT);
google::protobuf::io::ZeroCopyOutputStream* fileOutput = new google::protobuf::io::FileOutputStream(fd);
google::protobuf::io::CodedOutputStream* codedOutput = new google::protobuf::io::CodedOutputStream(fileOutput);
// save data
codedOutput->WriteLittleEndian32(PROTOBUF_MESSAGE_ID_NUMBER); // store with message id
codedOutput->WriteLittleEndian32(dList.ByteSize()); // the size of the data i will serialize
dList.SerializeToCodedStream(codedOutput); // serialize the data
// close streams
delete codedOutput;
delete fileOutput;
close(fd);
}
I've verified the data inside this function, the dList contains the data i expect. The streams report that no errors occur, and that a reasonable amount of bytes were written to disk. (also the file is of reasonable size)
But when i try to read back the data, it does not work. Moreover, what is really strange, is that if i append more data to this file, i can read the first messages (but not the one at the end).
void readDataFromFile()
{
// open streams
int fd = open("serializedMessage.pb", O_RDONLY);
google::protobuf::io::ZeroCopyInputStream* fileinput = new google::protobuf::io::FileInputStream(fd);
google::protobuf::io::CodedInputStream* codedinput = new google::protobuf::io::CodedInputStream(fileinput);
// read back
uint32_t sizeToRead = 0, magicNumber = 0;
string parsedStr = "";
codedinput->ReadLittleEndian32(&magicNumber); // the message id-number i expect
codedinput->ReadLittleEndian32(&sizeToRead); // the reported data size, also what i expect
codedinput->ReadString(&parsedstr, sizeToRead)) // the size() of 'parsedstr' is much less than it should (sizeToRead)
DataList dl = DataList();
if (dl.ParseFromString(parsedstr)) // fails
{
// work with data if all okay
}
// close streams
delete codedinput;
delete fileinput;
close(fd);
}
Obviously i have omitted some of the code here to simplify everything.
As a side note i have also also tried to serialize the message to a string & save that string via CodedOutputStream. This does not work either. I have verified the contents of that string though, so i guess culprit must be the stream functions.
This is a windows environment, c++ with protocol buffers and Qt.
Thank you for your time!

I solved this issue by switching from file descriptors to fstream, and FileCopyStream to OstreamOutputStream.
Although i've seen examples using the former, it didn't work for me.
I found a nice code example in hidden in the google coded_stream header. link #1
Also, since i needed to serialize multiple messages to the same file using protocol buffers, this link was enlightening. link #2
For some reason, the output file is not 'complete' until i actually desctruct the stream objects.

The read failure was because the file was not opened for reading with O_BINARY - change file opening to this and it works:
int fd = open("serializedMessage.pb", O_RDONLY | O_BINARY);
The root cause is the same as here: "read() only reads a few bytes from file". You were very likely following an example in the protobuf documentation which opens the file in the same way, but it stops parsing on Windows when it hits a special character in the file.
Also, in more recent versions of the library, you can use protobuf::util::ParseDelimitedFromCodedStream to simplify reading size+payload pairs.
... the question may be ancient, but the issue still exists and this answer is almost certainly the fix to the original problem.

try to use
codedinput->readRawBytes insead of ReadString
and
dl.ParseFromArray instead of ParseFromString
Not very familiar with protocol buffers but ReadString might only read a field of type strine.

Related

Issue with boost serialization not dearchiving when data length is over a specific value

Okay, so I am trying to send a struct with boost asio. The send on the client-side works fine and the read_until also seems fine. However, when it tries to deserialize the data back to the struct it won't work when the size of the archive is greater than about 475 in length. The rest of the struct gets ignored for some reason and only the data field gets printed. I also added screenshots of the output. Basically, when the whole struct is not received there is an input stream error on the line ba >> frame. I also tested both with a larger file and get the same error. I even tried serializing a vector as well so not sure where my error is.
EDIT:
I figured out the issue. When I was reading from the socket I had something like this...
boost::asio::read_until(socket, buf, "\0");
This was causing weird issues reading in all the data from the boost binary archive. To fix this issue I made a custom delimiter that I appended to the archive I was sending over the socket like...
boost::asio::read_until(socket, buf, "StopReadingHere");
This fixed the weird issue of the entire boost archive string not being read into the streambuf.
First Issue
ostringstream oss;
boost::archive::text_oarchive ba(oss);
ba << frame;
string archived_data = oss.str();
Here you take the string without ensuring that the archive is complete. Fix:
ostringstream oss;
{
boost::archive::text_oarchive ba(oss);
ba << frame;
}
string archived_data = oss.str();
Second issue:
boost::asio::read_until(socket, buf, "\0");
string s((istreambuf_iterator<char>(&buf)), istreambuf_iterator<char>());
Here you potentially read too much into s - buf may contain additional data after the '\0'. Use the return value from read_until and e.g. std::copy_n, following buf.consume(n).
If you then keep the buf instance for subsequent reads you will still have the previously read remaining data in the the buffer. If you discard it, that will lead to problems deserializing the next message.
Risky Code?
void write(tcp::socket& socket, string data, int timeout) {
auto time = std::chrono::seconds(timeout);
async_write(socket, boost::asio::buffer(data), transfer_all(), [&] (error_code error, size_t bytes_transferred) {
});
service.await_operation(time, socket);
}
You're using async operation, but passing local variables (data) as buffer.The risk is that data becomes invalid as soon as write returns.
Are you making sure that async_write is always completed before exiting from write? (It is possible that await_operation achieves this for you.
Perhaps you are even using await_operation from my own old answer here How to simulate boost::asio::write with a timeout . It's possible since things were added that some assumptions no longer hold. I can always review some larger piece of code to check.

Reopening a closed file stream

Consider the following code,
auto fin = ifstream("address", ios::binary);
if(fin.is_open())
fin.close()
for(auto i = 0; i < N; ++i){
fin.open()
// ....
// read (next) b bytes...
// ....
fin.close()
// Some delay
}
The code above can't be implemented in the C++ I know, but I'd like to know if it is possible to do so?
Here are my requirements:
When reopening the file, there would be no need to pass the parameters (path and mode) again.
When reopening the stream, it continues from the point in the stream that it was when got closed.
Clarification
The files I work with are big in size and in a point of time other threads from third party libraries may decide to (re)move them. An open stream will prevent such actions.
Continuously reading a big file will slow down the system.
The need
Indeed, a file can't be deleted by another process as long as a stream keeps it open.
I suppose you have already asked yourself these questions, but fo the recors I have to suggest you to think about it:
Can't the file be read into (virtual) memory and discarded when no longer needed ?
Can't the file processing be pipelined asynchronously, to read it at once and process it without unnecessary delays ?
What to do if the file can no longer be opened because it was deleted by the other process ? What to do if the location can't be found, because the file was modified (e.g. shortened) ?
If you would have the perfect solution to your issue, what would be the effect if the other process would try to delete the file when it is open (only for a short time, but nevertheless open and blocking the deletion) ?
The solution
Unfortunately, you can't achieve the desired behavior with standard streams. You could emulate it by keeping track of the filename and of the position (and more generally of the state):
auto mypos = ifs.tellg(); // saves position.
// Should flag be saved as well ? and what about gcount ?
ifs.close();
...
if (! ifs.is_open()) {
ifs.open(myfilename, myflags); // open again !
if (! ifs) {
// ouch ! file disapeared ==> process error
}
ifs.seekg(mypos); // restore position
if (! ifs) {
// ouch ! position no longer reachable ==> process error
}
}
Of course, you wouldn't like to repeat this code ever and ever. And it would not be so nice having all the sudden a lot of global variables to keep track of the stream's state. But you could very easily encapsulate it in a wrapper class that would take care of saving and restoring the stream's state using existing standard operations.

Reading binary data in memory into a stringstream and stream this over a socket

I would like to know if it is possible to, for instance, take a piece of data in memory, read it into an output stringstream (as binary data) and write this onto a socket for a client application to process.
The problem I run into while attempting this is the following:
Example:
char name[1024] = "Test";
std::ostringstream message (std::stringstream::out | std::stringstream::binary);
len = strlen(name);
message.write(reinterpret_cast<const char*>(&len), sizeof(int));
message.write(test, len*sizeof(char));
I want to write this stringstream to the socket with all of the data in it, but the problem is this: The stringstream write only executes the first time, in this case writing 4 (the length of the string) and none of the subsequent writes. Am I missing something here?
If this is not the best way to do it, what would be the best way to accomplish this? This is partly to reduce file I/O for cached memory snapshots.
Thanks in advance..
Your code (with minor fixes) appears to work for me, so you might check to be sure that you are correctly handling the buffered binary data, i.e. you do not assume that the std::string contains a string.

Force read on disk with std::ifstream instead of the file cache

I have a program that load data from a file using std::ifstream and store the data in a structure. After that, I verify if the data I want was in the file. If it is not, I ask the user to modify the file and press a key. I then reload the file. The problem is that even if the user modified the file, I always get the same data in the file because the file seems to be cache in the application. I've seen that in win32 API, it's possible to use the flag FILE_FLAG_NO_BUFFERING to avoid using a buffered copy when reading a file, but I would like to use that feature with std::ifstream. Is there any way to use the handle created through win32 api with ifstream or anyway to force it directly in std::ifstream ?
Here's a "simplified" code sample:
SomeStructure s = LoadData(fileName);
while(!DataValid(s))
s = LoadData(fileName);
SomeStructure LoadData(const std::string& fileName)
{
std::ifstream fileStream;
while(!OpenFileRead(fileName, fileStream))
{
std::cout<<"File not found, please update it";
fileStream.close();
//Wait for use input
std::string dummy;
std::getline(std::cin, dummy);
}
//... Read file, fill structure, and return
std::string line;
while(std::getline(fileStream, line) && line!="")
{
//At this point, I can see that line is wrong
StringArray namedatearray=Utils::String::Split(line, "|");
assert(namedatearray.size()==2);
//Add data to my structure ( a map)
}
fileStream.close();
//return structure
}
bool OpenFileRead(const std::string& name, std::fstream& file)
{
file.open(name.c_str(), std::ios::in);
return !file.fail();
}
Thanks.
Edit: Of course, it was a mistake because I had two time the same file in two very similar path. Looking at the handle of the file open with process explorer (and not the relative file path made me found it).
Instead of thinking that this is due to some kind of "buffering", I would look for the obvious things first.
Are you sure the user is changing the same file that you're reading?
Are you certain reloading the data is properly updating your data structure in memory?
Are you confident that DataValid() is doing what you want?
The fact that the OS uses file buffers to increase disk performance is generally not visible from the application level. As long as you're looking at the same file, the OS knows that the user updated the file, and if you reopen it, then you'll see the changed data. If the data never even had a chance to get flushed to disk, that won't affect your application.

How to get boost::iostream to operate in a mode comparable to std::ios::binary?

I have the following question on boost::iostreams. If someone is familiar with writing filters, I would actually appreciate your advices / help.
I am writing a pair of multichar filters, that work with boost::iostream::filtering_stream as data compressor and decompressor.
I started from writing a compressor, picked up some algorithm from lz-family and now am working on a decompressor.
In a couple of words, my compressor splits data into packets, which are encoded separately and then flushed to my file.
When I have to restore data from my file (in programming terms, receive a read(byte_count) request), I have to read a full packed block, bufferize it, unpack it and only then give the requested number of bytes. I've implemented this logic, but right now I'm struggling with the following problem:
When my data is packed, any symbols can appear in the output file. And I have troubles when reading file, which contains symbol (hex 1A, char 26) using boost::iostreams::read(...., size).
If I was using std::ifstream, for example, I would have set a std::ios::binary mode and then this symbol could be read simply.
Any way to achieve the same when implementing a boost::iostream filter which uses boost::iostream::read routine to read char sequence?
Some code here:
// Compression
// -----------
filtering_ostream out;
out.push(my_compressor());
out.push(file_sink("file.out"));
// Compress the 'file.in' to 'file.out'
std::ifstream stream("file.in");
out << stream.rdbuf();
// Decompression
// -------------
filtering_istream in;
in.push(my_decompressor());
in.push(file_source("file.out"));
std::string res;
while (in) {
std::string t;
// My decompressor wants to retrieve the full block from input (say, 4096 bytes)
// but instead retrieves 150 bytes because meets '1A' char in the char sequence
// That obviously happens because file should be read as a binary one, but
// how do I state that?
std::getline(in, t); // <--------- The error happens here
res += t;
}
Short answer for reading file as binary :
specify ios_base::binary when opening file stream.
MSDN Link