modify raw protobuf stream - c++

Let's say I have compiled an application (Receiver) with the following proto file:
syntax = "proto3";
message Control {
bytes version = 1;
uint32 id = 2;
bytes color = 3;
}
and I have another application (Transmitter) which initially has the same proto file but after an update a new field is added like:
syntax = "proto3";
message Control {
bytes name = 1;
uint32 id = 2;
bytes color = 3;
uint32 color_id = 4;
}
I have seen that if the Receiver app tries to parse the proto, change some data and then serialize it back the added fields coming from the Transmitter app are removed.
I need a way to change the id field directly accessing to the raw bytes without having to parse/serialize the proto. Is it possible ?
This is needed because I have some "header" fields in the Control message that I know that will never be changed but others that can be added/changed in the same proto of trasmitter app due to app update.
I have seen: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream
but I was not able to modify an existing bytestream and the ReadString is not able to understand the string length.
Thanks in advance

I don't think there is an official way to do it. You could do this by hand following the encoding guidelines by protobuf (https://developers.google.com/protocol-buffers/docs/encoding#structure).
Basically you should do this:
start decoding with the very first bit
decode until you reach the field number of the id
identify the bits representing the id and replace them with your new (encoded!) id
This is bad for several reasons. Most importantly, your code has to know details about the message structure and content (field number and data type of your id), and this is exactly what you want to avoid when using protocol buffers (you always need some info from the .proto files).

In proto2 syntax, protobuf C++ library used to preserve unknown fields so that when you re-encoded the message, they would remain. Unfortunately this feature (like many others) have been removed in the proto3 syntax.
One workaround could be to do it this way:
Set only the new id value in the Receiver message and encode it.
Append this data after the original binary data.
This relies on the protobuf feature that appended messages replace original values of fields in protobuf messages.
Hmm, actually reading the issue report linked above, it seems that you can turn on unknown field preservation in protobuf version 3.5 and newer.

Just deserialize the entire message and map it on the new message. It is the cleanest way. You do not have a lot of data and probably no real time requirements. Create a mapper and do not overthink the problem.

Related

Omit fields when printing Protobuf message

Is it possible to choose what fields or at least what field types to be considered when calling message.DebugString() in Google Protobuf?
I have the following message description:
message Message
{
optional string name = 1
optional int32 blockSize = 2;
optional bytes block = 3;
}
I only want to print name and blockSize and omit the block field, which happens to be large (e.g.: 64KB) and its content is insignificant.
I built a method that specifically adds to a std::stringstream only the fields of interest but it seems that I have to modify the method for every change in message description.
Your best bet is to make a copy of the message, clear block from the copy, then print it.
Message copy = original;
copy.clear_block();
cout << copy.DebugString() << endl;
Note that there's no performance concern here because DebugString() itself is already much slower than making a copy of the message.
If you want to make this more general, you could write some code based on protobuf reflection which walks over the copied message and removes all fields of type bytes with long sizes.

protobuf repeatedly writes out a non-repeated field while serializing

i have a weird problem.
I added one field to my protobuf like this:
{
optional uint32 avg_body_length = 90;
optional uint32 pub_body_length = 95; //new field
}
My program(a service running continuously) updates pub_body_length and then writes it to some storage periodically.
After several hours, i found the size of the serialized protobuf grew rapidly.
By analyzing the serialized protobuf manually, it turns out field pub_body_length appears thousands of time with the same value.
I use c++ and the version of protoc is 2.4.1
Anyone has any clue?

Parsing error in Protocol Buffer using C++ API

I have a test.proto file having the code shown below. I am using code generated by this file in my client server program.
message Person {
required string user_name = 1;
optional int32 favourite_number = 2;
repeated string interests = 3;
}
Client side i have no problem to send data but at the server side i am getting protocol buffer parsing error(in the file:\protobuf\message_lite.cc(line123)) says "cant parse message of type 'person' because its missing required field:user_name"
Though i have checked my client side but couldnt find anything wrong but i might missing something at server side which is not reading string data ?
//Server side code for Protocol Buffers
Person filldata;
google::protobuf::uint32 size;
//here i might need google::protobuf::string stsize; Not sure ?
google::protobuf::io::ArrayInputStream ais(buffer,filldata.ByteSize());
CodedInputStream coded_input(&ais);
coded_input.ReadVarint32(&size);
//have tried here both coded_input.ReadString and coded_input.ReadRaw
filldata.ParseFromCodedStream(&coded_input);
cout<<"Message is "<<filldata.DebugString();
//still getting same error have no idea what to do exactly to fix it :(
Haved Looked Here but still couldnt got it from that explanation, Hope someone can fix it.
Thanks!
google::protobuf::io::ArrayInputStream ais(buffer,filldata.ByteSize());
At this point, filldata is a newly-initialized message, so filldata.ByteSize() is zero. So, you're telling protobufs to parse an empty array. Hence, no fields are set, and you get a required fields error. Messages have variable length, so you need to make sure the exact message size is passed along from the server.

Protocol Buffers - Reading header (nested message) common across all messages

I am currently evaluating Protocol Buffers for use in a project (no code written as of yet). One of the things I'm unclear on is how you would read part of an encoded message, for example say I have a common header:
message Header {
required uint16 msg_type = 1;
required uint16 length = 2;
}
And say I deliver multiple different messages to a queue. How would the consumer work out how much data to read per message and what message type is should be constructed as?
There should be no need for a Header message here; the most common approach is to follow the "streaming" advice from here. Within that, you could either treat it as a sequence of identical union type messages, or (my preference) when writing, instead of just writing a length-prefix before each, include a varint that indicates the message type then the length (as a varint). The number that indicates the message type is some arbitrary map you invent, so 1 = Foo, 2 = Bar, 3 = Blap, etc). If you left-shift the message-type by 3 bits then "or" 2, then it will also be a well-formed protobuf stream itself, 100% identical to a repeated YourUnionType.
Basically, this is exactly the same as this answer, but instead of being field 1 each time, the number varies per message-type. Most implementations have a reader/writer API that make it possible to read and write raw varints, and to length-restrict the reader API. Some implementations have helper mechanisms to support streams of heterogeneous messages directly (basically, doing all the above for you).
In a recent project, I used Protocol Buffers like this:
We had one 'container' message that included all the actual messages as optional members:
message ContainerMessage {
optional Message1 message_1 = 1;
optional Message2 message_2 = 2;
//...
optional MessageN message_N = N;
}
Inside an application, you could just use ContainerMessage as a discriminated union of the real Messages.
Between applications, we serialized/deserialized the ContainerMessage and sent the serialized content, prefixed with a simple header containing the length of the serialized content.
That will depend on the protocol you are using.
Note that e.g. a lot of protocols go via serial interfaces, where you might have extra lines telling when a message starts and stops.
Often, messages will have there length at a fixed offset after the message start.
In other cases, you might need to parse the message element by element to find out how much of the message is left. So a string embedded in the message may be of fixed length, or have the length at the beginning, or might have \0 as end marker.
Mostly, when you store messages in a queue for further processing, you will want to add some more information to make your life easier - like when you just have an extra signal telling you when the message stops, you might store the message internally with its length.

Curlpp, incomplete data from request

I am using Curlpp to send requests to various webservices to send and receive data.
So far this has worked fine since i have only used it for sending/receiving JSON data.
Now i have a situation where a webservice returns a zip file in binary form. This is where i encountered a problem where the data received is not complete.
I first had Curl set to write any data to a ostringstream by using the option WriteStream, but this proved not to be the correct approach since the data contained null characters, and thus the data stopped at the first null char.
After that, instead of using WriteStream i used WriteFunction with a callback function.
The problem in this case is that this function is always called 2 or 3 times, regardless of the amount of data.
This results in always having a few chunks of data that don't seem to be the first part of the file, although the data always contains PK as the first 2 characters, indicating a zip file.
I used several tools to verify that the data is entirely being sent to my application so this is not a problem of the webservice.
Here the code. Do note that the options like hostname, port, headers and postfields are set elsewhere.
string requestData;
size_t WriteStringCallback(char* ptr, size_t size, size_t nmemb)
{
requestData += ptr;
int totalSize= size*nmemb;
return totalSize;
}
const string CurlRequest::Perform()
{
curlpp::options::WriteFunction wf(WriteStringCallback);
this->request.setOpt( wf );
this->request.perform();
return requestData;
}
I hope anyone can help me out with this issue because i've run dry of any leads on how to fix this, also because curlpp is poorly documented(and even worse since the curlpp website disappeared).
The problem with the code is that the data is put into a std::string, despite having the data in binary (ZIP) format. I'd recommend to put the data into a stream (or a binary array).
You can also register a callback to retrieve the response headers and act in the WriteCallback according to the "Content-type".
curlpp::options::HeaderFunction to register a callback to retrieve response-headers.
std::string is not a problem, but the concatenation is:
requestData += ptr;
C string (ptr) is terminated with zero, if the input contains any zero bytes, the input will be truncated. You should wrap it into a string which knows the length of its data:
requestData += std::string(ptr, size*nmemb);