Get offset of node within rapidjson? - c++

I am deserializing a json string into an object using rapidjson. When I encounter an issue, not with the structure of the json, but with the content, I want to report an error stating the offset of where the problem is.
Unfortunately, unless it is a parse error, I don't see where I can get the current offset of a Value within a Document. Anyone have any ways of accomplishing this?
For example:
Document doc;
doc.Parse<0>(json.c_str());
if( doc.HasMember( "Country" ) ) {
const Value& country_node = doc["Country"];
if( !isValid(country_node.GetString()) )
cout << "Invalid country specified at position " << country_node.Offset()?????
}

Unfortunately, RapidJSON does not support this in the DOM API.
If you use the SAX API, when you encounter an invalid value, you can return false in the handler function, and the Reader will generate a kParseErrorTermination error with the offset.
The reason why this is not supported in DOM because this will incur memory overhead and may only be used rarely. Please drop an issue at GitHub if you would like to further discuss this feature with the community.

Related

How to use the cursor in the new C++ Mongo Driver

I am using the new C++ driver to access MongoDB from my C++ program.
Through the tutorial I am able to fetch an entire collection from the DB.
I am also able to specify filters so I only get a few.
But once I get the collection data into the program, there is only a single example available for inspecting the data:
for (auto&& doc : cursor) {
std::cout << bsoncxx::to_json(doc) << std::endl;
}
I would like to know how to get the count of the collection
I would also like to know how to get number "i" in the return data, i.e.,:
cursor[i] or similar... which of course doesn't work.
Thanks for pointing out this oversight in our examples. If you would, please file a bug in the Documentation component at https://jira.mongodb.org/browse/CXX requesting that our examples include more detail on how to access data on the client.
You have two questions here, really:
How can I get a count? The unhelpful answer is that you could probably write std::distance(cursor.begin(), cursor.end()), but you probably don't want to do that, as it would require pulling all of the data back from the server. Instead, you likely want to invoke mongocxx::collection::count.
How can I get the Nth element out of a cursor? First, are you sure this is what you want? The obvious way would be to do auto view = *std::next(cursor.begin(), N-1), but again, this probably isn't what you want for the reasons above, and also because the order is not necessarily specified. Instead, have a look at mongocxx::options::find::sort, mongocxx::options::find::limit, and mongocxx:options::find::skip, which should give you finer control over what data is returned via the cursor, and in what order.
Many thanks, acm!
I filed the bug and I figured out how to do it. To help others, let me post the two code examples here:
auto db = conn["db-name"];
int count = db["collection-name"].count( {} );
And
mongocxx::options::find opts;
opts.limit( 1 );
auto cursor = db["db-name"].find({ }, opts);
bsoncxx::document::view doc = *cursor.begin();
std::cout << bsoncxx::to_json(doc) << std::endl;

register ErrorCollector or intercept parse errors for wire format?

When It is possible to define a custom ErrorCollector class for handling google::protobuf parsing errors
struct ErrorCollector : ::google::protobuf::io::ErrorCollector
{
void AddError(int line, int column, const std::string& message) override
{
// log error
}
void AddWarning(int line, int column, const std::string& message) override
{
// log warning
}
};
When parsing from a text file, you can use the protobuf TextFormat class and register your custom ErrorCollector
::google::protobuf::io::IstreamInputStream input_stream(&file);
::google::protobuf::TextFormat::Parser parser;
ErrorCollector error_collector;
parser.RecordErrorsTo(&error_collector);
if (parser.Parse(&input_stream, &msg))
{
// handle msg
}
For parsing wire format, I currently use Message::ParseFromArray
if (msg.ParseFromArray(data, data_len))
{
// handle msg
}
This doesn't allow me to specify a custom ErrorCollector though.
I've searched through the source code, but as of yet have been unable to find if this is possible.
Is it possible to use an ErrorCollector when parsing wire format?
Is there another way to intercept parse errors and make them available to client code?
There are essentially two ways that parsing the wire format could fail:
The bytes are not a valid protobuf (e.g. they are corrupted, or in a totally different format).
A required field is missing.
For case 1, protobuf does not give you any more information than "it's invalid". This is partly for code simplicity (and speed), but it is also partly because any attempt to provide more information usually turns out more misleading than helpful. Detailed error reporting is useful for text format because text is often written by humans, but machines make very different kinds of errors. In some languages, protobuf actually reports specific errors like "end-group tag does not match start-group tag". In the vast majority of cases, this error really just means "the bytes are corrupted", but inevitably people think the error is trying to tell them something deeper which they do not understand. They then post questions to stack overflow like "How do I make sure my start-group and end-group tags match?" when they really should be comparing bytes between their source and destination to narrow down where they got corrupted. Even reporting the byte position where the parse error occurred is not very useful: protobuf is a dense encoding, which means that many random corrupt byte sequences will parse successfully, which means the parser may only notice a problem somewhere later down the line rather than at the point where things actually went wrong.
The one case that clearly is useful to distinguish is case 2 (missing required fields) -- at least, if you use required fields (I personally recommend avoiding them). There are a couple options here:
Normally, required field checks write errors to the console (on stderr). You can intercept these and record them your own way using SetLogHandler, but this doesn't give you structured information, only text messages.
To check required fields more programmatically, you can separate required field checking from parsing. Use MessageLite::ParsePartialFromArray() or one of the other Partial parsing methods to parse a message while ignoring the absence of required fields. You can then use the MessageLite::IsInitialized() to check if all required fields are set. If it returns false, use Message::FindInitializationErrors() to get a list of paths of all required fields that are missing.

protobuf required field and default value

I am new to protobuf and I have started considering the following trivial example
message Entry {
required int32 id = 1;
}
used by the c++ code
#include <iostream>
#include "example.pb.h"
int main() {
std::string mySerialized;
{
Entry myEntry;
std::cout << "Serialization succesfull "
<< myEntry.SerializeToString(&mySerialized) << std::endl;
std::cout << mySerialized.size() << std::endl;
}
Entry myEntry;
std::cout << "Deserialization successfull "
<< myEntry.ParseFromString(mySerialized) << std::endl;
}
Even if the "id" field is required, since it has not been set, the size of the serialization buffer is 0 (??).
When I deserialize the message an error occurs:
[libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "Entry" because it is missing required fields: id
Is it a normal behavior?
Francesco
ps- If I initialize "id" with the value 0, the behavior is different
pps- The SerializeToString function returns true, the ParseFromString returns false
I dont think I exactly understand your question, but I'll have a go at the answer anyways. Hope this helps you in some way or the other :)
Yes this is normal behavior. You should add required only if the field is important to the message. It makes sense semantically. (why would you skip a required field). To enforce this, protobuf would not parse the message.
It sees that the field marked with number 1 is required, and the has_id() method is returning false. So it wont parse the message at all.
In the developer guide it is advised not to use required fields.
Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.
Also
Any new fields that you add should be optional or repeated. This means that any messages serialized by code using your "old" message format can be parsed by your new generated code, as they won't be missing any required elements. You should set up sensible default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available. Note that preservation of unknown fields is currently not available for Python

More graceful error handling in C++ library - jsoncpp

I'm not sure if this will be a specific thing with jsoncpp or a general paradigm with how to make a C++ library behave better. Basically I'm getting this trace:
imagegeneratormanager.tsk: src/lib_json/json_value.cpp:1176: const Json::Value& Json::Value::operator[](const char*) const: Assertion `type_ == nullValue || type_ == objectValue' failed.
That happens when the input is bad. When the input - which is coming from another application of mine via memcached - happens to be bad, I would like to handle this error. You know, gracefully. Perhaps something like, "error: input for item 15006 is bad" going to the log. Not crashing my entire JSON-string-processing task.
Is this just a badly written library or is it possible to configure it more subtly?
Edit: here's some calling code:
Json::Value root;
Json::Reader reader;
succeeded = reader.parse(jsonString, root);
if(!succeeded) {
throw std::runtime_error(std::string("Failed to parse JSON for key ") + emailInfoKey.str());
}
std::string userEmail = root.get("userId", "").asString();
std::string bodyFilePath = root.get("bodyFilePath", "").asString();
std::string msgId = root.get("msgId", "").asString();
According to the library reference:
Value & Json::Value::operator[] ( const StaticString & key )
Access an object value by name, create a null member if it does not exist.
Seems you are trying to call operator[] on a non-object, say an integer or a string (get internally uses operator[]). You are breaking the function precondition, and its an error on your side of the code, not the library. You could check if the Json::Value is an object before accessing it as such using isObject().
As I see from the JsonCpp Sourceforge repo, right now assertions aren't catchable (however it seems to be in their backlog to make throwing assertions).
Then, you'll have to test if the input is valid before calling the [] operator.
A link to the source code of the newest revision (I don't know the version you have). See line 1141:
http://jsoncpp.svn.sourceforge.net/viewvc/jsoncpp/trunk/jsoncpp/src/lib_json/json_value.cpp?revision=249&view=markup

COM Error 0x80004003 (Invalid Pointer) access MS Outlook contacts

I am some ATL code that uses smart COM pointers to iterate through MS Outlook contacts, and on some PC's I am getting a COM error 0x80004003 ('Invalid Pointer') for each contact. The same code works fine on other PCs. The code looks like this:
_ApplicationPtr ptr;
ptr.CreateInstance(CLSID_Application);
_NameSpacePtr ns = ptr->GetNamespace(_T("MAPI"));
MAPIFolderPtr folder = ns->GetDefaultFolder(olFolderContacts);
_ItemsPtr items = folder->Items;
const long count = items->GetCount();
for (long i = 1; i <= count; i++)
{
try
{
_ContactItemPtr contactitem = items->Item(i);
// The following line throws a 0x80004003 exception on some machines
ATLTRACE(_T("\tContact name: %s\n"), static_cast<LPCTSTR>(contactitem->FullName));
}
catch (const _com_error& e)
{
ATLTRACE(_T("%s\n"), e.ErrorMessage());
}
}
I wonder if any other applications/add-ins could be causing this? Any help would be welcome.
FullName is a property and you do the GET operation (it's probably something like this in IDL: get_FullName([out,retval] BSTR *o_sResult)). Such operation works ok with null values.
My assumption is that contactItem smart pointer points to any valid COM object. In such case the formatting operation done by ATLTRACE can cause the problem. Internally it behaves probably like standard sprintf("",args...) function.
To avoid such problems just do something like below:
ATLTRACE(_T("\tContact name: %s\n"),
_bstr_t(contactitem->FullName)?static_cast<LPCTSTR>(contactitem->FullName):"(Empty)")
Just a guess:
Maybe the "FullName" field in the address book is empty and that's why the pointer is invalid?
hard to tell, because your code doesn't indicate which COM-interfaces you're using.
Does this make any difference?
ATLTRACE(_T("\tContact name: %s\n"), static_cast<LPCTSTR>(contactitem->GetFullName()));
In my example you format NULL value to a proper text value.
If the question is about the difference between FullName(as a property) and GetFullName() (as a method) then the answer is no. Property and method should give the same result. Sometimes property can be mapped to different methods then setXXX and getXXX. It can be achieved by using some specific syntax in IDL (and in reality in TLB after compilation of IDL to TLB). If property FullName is not mapped to method GetFullName then you will achieve different result.
So please examine file *.tlh after importing some type library to your project...