protobuf required field and default value - c++

I am new to protobuf and I have started considering the following trivial example
message Entry {
required int32 id = 1;
}
used by the c++ code
#include <iostream>
#include "example.pb.h"
int main() {
std::string mySerialized;
{
Entry myEntry;
std::cout << "Serialization succesfull "
<< myEntry.SerializeToString(&mySerialized) << std::endl;
std::cout << mySerialized.size() << std::endl;
}
Entry myEntry;
std::cout << "Deserialization successfull "
<< myEntry.ParseFromString(mySerialized) << std::endl;
}
Even if the "id" field is required, since it has not been set, the size of the serialization buffer is 0 (??).
When I deserialize the message an error occurs:
[libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "Entry" because it is missing required fields: id
Is it a normal behavior?
Francesco
ps- If I initialize "id" with the value 0, the behavior is different
pps- The SerializeToString function returns true, the ParseFromString returns false

I dont think I exactly understand your question, but I'll have a go at the answer anyways. Hope this helps you in some way or the other :)
Yes this is normal behavior. You should add required only if the field is important to the message. It makes sense semantically. (why would you skip a required field). To enforce this, protobuf would not parse the message.
It sees that the field marked with number 1 is required, and the has_id() method is returning false. So it wont parse the message at all.
In the developer guide it is advised not to use required fields.
Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.
Also
Any new fields that you add should be optional or repeated. This means that any messages serialized by code using your "old" message format can be parsed by your new generated code, as they won't be missing any required elements. You should set up sensible default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available. Note that preservation of unknown fields is currently not available for Python

Related

register ErrorCollector or intercept parse errors for wire format?

When It is possible to define a custom ErrorCollector class for handling google::protobuf parsing errors
struct ErrorCollector : ::google::protobuf::io::ErrorCollector
{
void AddError(int line, int column, const std::string& message) override
{
// log error
}
void AddWarning(int line, int column, const std::string& message) override
{
// log warning
}
};
When parsing from a text file, you can use the protobuf TextFormat class and register your custom ErrorCollector
::google::protobuf::io::IstreamInputStream input_stream(&file);
::google::protobuf::TextFormat::Parser parser;
ErrorCollector error_collector;
parser.RecordErrorsTo(&error_collector);
if (parser.Parse(&input_stream, &msg))
{
// handle msg
}
For parsing wire format, I currently use Message::ParseFromArray
if (msg.ParseFromArray(data, data_len))
{
// handle msg
}
This doesn't allow me to specify a custom ErrorCollector though.
I've searched through the source code, but as of yet have been unable to find if this is possible.
Is it possible to use an ErrorCollector when parsing wire format?
Is there another way to intercept parse errors and make them available to client code?
There are essentially two ways that parsing the wire format could fail:
The bytes are not a valid protobuf (e.g. they are corrupted, or in a totally different format).
A required field is missing.
For case 1, protobuf does not give you any more information than "it's invalid". This is partly for code simplicity (and speed), but it is also partly because any attempt to provide more information usually turns out more misleading than helpful. Detailed error reporting is useful for text format because text is often written by humans, but machines make very different kinds of errors. In some languages, protobuf actually reports specific errors like "end-group tag does not match start-group tag". In the vast majority of cases, this error really just means "the bytes are corrupted", but inevitably people think the error is trying to tell them something deeper which they do not understand. They then post questions to stack overflow like "How do I make sure my start-group and end-group tags match?" when they really should be comparing bytes between their source and destination to narrow down where they got corrupted. Even reporting the byte position where the parse error occurred is not very useful: protobuf is a dense encoding, which means that many random corrupt byte sequences will parse successfully, which means the parser may only notice a problem somewhere later down the line rather than at the point where things actually went wrong.
The one case that clearly is useful to distinguish is case 2 (missing required fields) -- at least, if you use required fields (I personally recommend avoiding them). There are a couple options here:
Normally, required field checks write errors to the console (on stderr). You can intercept these and record them your own way using SetLogHandler, but this doesn't give you structured information, only text messages.
To check required fields more programmatically, you can separate required field checking from parsing. Use MessageLite::ParsePartialFromArray() or one of the other Partial parsing methods to parse a message while ignoring the absence of required fields. You can then use the MessageLite::IsInitialized() to check if all required fields are set. If it returns false, use Message::FindInitializationErrors() to get a list of paths of all required fields that are missing.

MongoDB finding using an _id

I am using the latest version of the new C++ Mongodb driver/library (not the legacy, 26compat or C version) along with the Qt framework (latest 64b on Linux). Within the same program I am successfully reading and writing to the database and everything is working well.
I realize this version is unstable, but I don't want the boost dependencies and it is a new project, that only I am working on.
I'm not a professional programmer, so please forgive any knowledge gaps.
In my database I have a supporting collection that just remembers the last project a user was working on and I want to do is use a value stored within that document as a string with a field name, to load that project when the program is started.
I am wanting to use the key stored within the m_Current_Project_key variable to load the project data from the project collection.
In the code below the first line after the find statement, carries out the search using different hard coded field name and data in the same collection, just to prove the code works more generally.
The problem I am having is getting the program to search for a specific "_id" that I can see is correctly in the collection and document from the mongo command line.
The comments on the end of the lines of code below show the output achieved for different things I have tried.
This sits within a method that reads a different collection from the same database and get a value from within it, that it puts in the m_Current_Project_key variable which is a QString.
qDebug() << m_Current_Project_key; // "553b976484e4b5167e39b6f1"
qDebug() << Utility::format_key(m_Current_Project_key); // "ObjectId("553b976484e4b5167e39b6f1")" - this utility function just modifies the value passed to it to look like the output
QString test = Utility::format_key(m_Current_Project_key);
test.remove('\"');
qDebug() << test; // "ObjectId(553b976484e4b5167e39b6f1)"
char const *c = m_Current_Project_key.toStdString().c_str();
qDebug() << c; // 553b976484e4b5167e39b6f1
bsoncxx::oid hhh(c, 12);
qDebug() << hhh.get_time_t(); // 892679010
auto cursor = db["project"].find(document{}
// << "title" << "Testing Project"
<< "_id"
<< c
// << hhh
// << m_Current_Project_key.toStdString()
// << m_Current_Project_key.toStdString().c_str()
// << Utility::format_key(m_Current_Project_key).toStdString()
// << test.toStdString()
<< finalize);
The cursor only points to a value when I use the title line above, without the next two lines - the value I get is the document I want, but in the real situation the only thing the program would know, at this point would be the "_id". The project name might not be unique.
I have tried casting the std::string to an OID, but that wasn't recognised as a type.
I've done a lot of Googling and a lot of trying things out and I can't believe there isn't a straight forward way to find a document based on it's "_id". In the examples the only finding examples use values other than the "_id".
db.project.find({ "_id" : ObjectId("553b976484e4b5167e39b6f1")}, { title : 1 })
Does what I want on the Mongo command line.
I would appreciate any assistance I could get with this, I have spent a lot of time trying.
Thanks.
The issue here is that you are using the wrong bsoncxx::oid constructor. When creating an oid from a std::string of the hex representation of the ObjectId (e.g. "553b976484e4b5167e39b6f1") you should use the single-argument constructor that takes a stdx::string_view.
The correct code looks like this:
using bsoncxx::stdx::string_view;
auto cursor = db["project"].find(document{}
<< "_id"
<< bsoncxx::oid{stdx::string_view{m_Current_Project_key.toStdString()}}
<< finalize
);

Get offset of node within rapidjson?

I am deserializing a json string into an object using rapidjson. When I encounter an issue, not with the structure of the json, but with the content, I want to report an error stating the offset of where the problem is.
Unfortunately, unless it is a parse error, I don't see where I can get the current offset of a Value within a Document. Anyone have any ways of accomplishing this?
For example:
Document doc;
doc.Parse<0>(json.c_str());
if( doc.HasMember( "Country" ) ) {
const Value& country_node = doc["Country"];
if( !isValid(country_node.GetString()) )
cout << "Invalid country specified at position " << country_node.Offset()?????
}
Unfortunately, RapidJSON does not support this in the DOM API.
If you use the SAX API, when you encounter an invalid value, you can return false in the handler function, and the Reader will generate a kParseErrorTermination error with the offset.
The reason why this is not supported in DOM because this will incur memory overhead and may only be used rarely. Please drop an issue at GitHub if you would like to further discuss this feature with the community.

ProtocolBuffer, abort() on SerializeToArray()

I made a ProtocolBuffer object from the proto class I usually use and I need to Serialize it. Now, I take the object and call SerializeToArray() on it like this:
int size = messageObject.ByteSize();
void* buffer = malloc(size);
messageObject.SerializeToArray(buffer, size);
As far as I know there is no problem with this since the object has data in it (I checked it by breaking right before the Serialize line).
When the method calls however it triggers an abort() which I don't know anything about.
I have no idea what it could be. The only data that is included in this object is a "type" enumerator (which I can set to the type of data that is being used in this object since it can include different sorts of messages) and it holds one message object of the repeatable type.
message MessageID
{
enum Type { LOGINDATA = 1; PLAYERDATA = 2; WORLDDATA = 3; }
// Identifies which field is filled in.
required Type type = 1;
// One of the following will be filled in.
repeated PlayerData playerData = 2;
optional WorldData worldData = 3;
optional LoginData loginData = 10;
}
This is the base message. So, Type is 2 in this case which stands for PLAYERDATA. Also, playerData is being set with a single object of the type PlayerData.
An help is appreciated.
Any time that the protobuf library aborts (which, again, should only be in debug mode or in sever circumstances), it will print information about the problem to the console. If your app doesn't have a console, you can use google::protobuf::SetLogHandler to direct the information somewhere else:
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.common#SetLogHandler.details
typedef void LogHandler(LogLevel level, const char* filename,
int line, const std::string& message);
LogHandler* SetLogHandler(LogHandler* new_func);
The protobuf library sometimes writes warning and error messages to stderr.
These messages are primarily useful for developers, but may also help end users figure out a problem. If you would prefer that these messages be sent somewhere other than stderr, call SetLogHandler() to set your own handler. This returns the old handler. Set the handler to NULL to ignore log messages (but see also LogSilencer, below).
Obviously, SetLogHandler is not thread-safe. You should only call it at initialization time, and probably not from library code. If you simply want to suppress log messages temporarily (e.g. because you have some code that tends to trigger them frequently and you know the warnings are not important to you), use the LogSilencer class below.
The only reason for an abort that I know of (which only applies in debug builds) is if some required field isn't set. You say that the type field is set, so there must be a required field in PlayerData which is not set.

Include pre-encoded protocol buffer message within outer message

Is there a way to create a protocol buffer message in C++ that contains a pre-encoded inner message, without parsing and then re-serializing the inner message?
To clarify, consider the following message definitions:
message Inner {
required int i = 1;
// ... more fields ...
}
message Outer {
repeated Inner inners = 1;
// ... more fields ...
}
Suppose you have a collection of 10 byte arrays, each of which contains an encoded version of an Inner. You'd like to create an Outer that contains the 10 Inners. You don't want to hand-encode because Outer has other fields and may itself be included in other messages. Is there a way to get protocol buffers to directly copy the pre-encoded Inner?
There is no a clean way, but there are a few hacky ways. One is to define a second message like this:
message RawOuter {
repeated bytes inners = 1;
// ... same fields as Outer ...
}
RawOuter is identical to Outer except that the inners repeated field has been changed from type Inner to type bytes. If you populate inners with the encoded instances of Inner, then serialize the RawOuter, you get exactly the same result as if you had built an Outer with the parsed verisons. That is to say, the wire format for a nested message is identical to the wire format for a bytes field containing the serialization of that nested message. This is one of those funny exploitable quirks of the protobuf encoding.
This hack has some problems, though. In particular, it doesn't work well if you're trying to build an Outer instance that is embedded in some other proto, since you probably don't want to maintain two copies of every containing message, one using Outer and one using RawOuter.
Another, even hackier option is to inject the encoded messages into the Outer instance's UnknownFieldSet.
Outer outer;
for (auto& inner: inners) {
outer.mutable_unknown_fields()
->AddLengthDelimited(1, inner);
}
The UnknownFieldSet is intended to store fields seen while parsing that do not match any known field number defined in the .proto file. The idea is that this allows you to write a proxy server that simply receives messages and forwards them to another server without having to re-compile the proxy every time you add a new field to the protocol. Here, we're abusing it by sticking a value into it that actually corresponds to a known field, but the implementation will not notice, and so it will write out these fields just fine.
The main problem with this approach is that if anyone else inspects your Outer instance in the meantime, it will appear to them as if the inners list is empty, since the values are actually hidden somewhere else. This is a pretty ugly hack that will probably come back to haunt you later. I would only recommend it if you have measured the performance difference and found it to be large.
Also note that the serialization code always writes unknown fields last, whereas known fields are written in order by field number. Parsers are supposed to accept any order, but occasionally you'll find someone who is using the unparsed data as a hash map key or something and that totally breaks if the fields are re-ordered.
By the way, you can improve performance of both of these approaches by swapping the strings into place rather than copying, i.e.
raw_outer->add_inners()->swap(inner);
or
outer->mutable_unknown_fields()->AddLengthDelimited(1)->swap(inner);