cpp RapidJSON - Resolve key conflicts without information loss - c++

I want to parse a text file which is similar to JSON. After some character conversions, it still has some objects, which have key conflicts. So my JSON looked like this:
{
"key1": {
"a": "asdf",
"a": "foo",
"a": "bar",
"a": "fdas"
}
}
And i wanted to resolve it into this:
{
"key1": {
"a": [
"asdf",
"foo",
"bar",
"fdas"
]
}
}
I tried to achieve this with JsonCpp, but it can't handle the key conflicts. So i chose to use RapidJSON, especially because it CAN keep all the key-conflict-members when parsing.
To then resolve the key conflicts without loosing information, i wrote the following recursive RapidJSON cpp code:
void resolveKeyConflicts(rj::Value& value) {
if (value.IsObject()) {
std::map<std::string, unsigned int> nameCount;
for (rj::Value::MemberIterator vMIt = value.MemberBegin();
vMIt != value.MemberEnd(); vMIt++) {
std::string name(vMIt->name.GetString());
if (nameCount.find(name) == nameCount.end()) {
nameCount[name] = 1;
} else {
nameCount[name] += 1;
}
}
for (std::map<std::string, unsigned int>::iterator nCIt =
nameCount.begin(); nCIt != nameCount.end(); nCIt++) {
if (nCIt->second > 1) {
rj::Value newArray(rj::kArrayType);
for (rj::Value::MemberIterator vFMIt = value.FindMember(
nCIt->first.c_str()); vFMIt != value.MemberEnd();
vFMIt++) {
if (vFMIt->name.GetString() == nCIt->first) {
rj::Value value(vFMIt->value, this->GetAllocator());
newArray.PushBack(value, this->GetAllocator());
}
}
value.EraseMember(value.FindMember(nCIt->first.c_str()),
value.MemberEnd());
rj::Value key(nCIt->first.c_str(), nCIt->first.length(),
this->GetAllocator());
value.AddMember(key, newArray, this->GetAllocator());
}
}
for (rj::Value::MemberIterator vMIt = value.MemberBegin();
vMIt != value.MemberEnd(); vMIt++) {
if (vMIt->value.IsObject() || vMIt->value.IsArray()) {
resolveKeyConflicts(vMIt->value);
}
}
} else if (value.IsArray()) {
for (rj::Value::ValueIterator vVIt = value.Begin(); vVIt != value.End();
vVIt++) {
resolveKeyConflicts(*vVIt);
}
}
}
This works pretty good as long as the conflicting key-members are the only members in that object. This can, i think, be archived with simpler code, but i additionally tried to be able to resolve arbitrary key conflicts like this:
{
"key2": {
"a": "asdf",
"b": "foo",
"b": "bar",
"c": "fdas"
}
}
Into this:
{
"key2": {
"a": "asdf",
"b": [
"foo",
"bar"
],
"c": "fdas"
}
}
Turns out FindMember does not, as i thought, gives back an iterator over all members with the same key name, but just the position of the first member with that key. I think my python way of thinking may have messed with my expectations on FindMember. So like this, the code is going to lose the "c": "fdas" member.
I relied on MemberIterator EraseMember(MemberIterator first, MemberIterator last) because all of the other methods to remove a member mentioned in http://rapidjson.org/md_doc_tutorial.html#ModifyObject seem to have problems removing the last member in the key1 case. But EraseMember like this is definitely the wrong choice for the key2 case.
So I'm kind of lost here. Can please somebody point me into the right direction to resolve the key conflicts without information loss, which can handle both the key1 and the key2 case?
edit: I'm using RapidJSON from https://github.com/miloyip/rapidjson/tree/v1.0.2 which is at the v1.0.2 tag.

I think the tricky part is to memorize whether the key has already expanded to an array (because the value may be originally an array).
So, another way is firstly convert all key: value into key:[value], do the merge, and then convert back to key: value if there is only one element in the array.
This is my attempt:
static void MergeDuplicateKey(Value& v, Value::AllocatorType& a) {
if (v.IsObject()) {
// Convert all key:value into key:[value]
for (Value::MemberIterator itr = v.MemberBegin(); itr != v.MemberEnd(); ++itr)
itr->value = Value(kArrayType).Move().PushBack(itr->value, a);
// Merge arrays if key is duplicated
for (Value::MemberIterator itr = v.MemberBegin(); itr != v.MemberEnd();) {
Value::MemberIterator itr2 = v.FindMember(itr->name);
if (itr != itr2) {
itr2->value.PushBack(itr->value[0], a);
itr = v.EraseMember(itr);
}
else
++itr;
}
// Convert key:[values] back to key:value if there is only one value
for (Value::MemberIterator itr = v.MemberBegin(); itr != v.MemberEnd(); ++itr) {
if (itr->value.Size() == 1)
itr->value = itr->value[0];
MergeDuplicateKey(itr->value, a); // Recursion on the value
}
}
else if (v.IsArray())
for (Value::ValueIterator itr = v.Begin(); itr != v.End(); ++itr)
MergeDuplicateKey(*itr, a);
}
I tested it in this commit.

I completely rewrote that part, trying (again) another approach. I think i found a pretty elegant solution:
void resolveKeyConflicts(rj::Value& value) {
if (value.IsObject()) {
std::vector<std::string> resolvedConflicts;
rj::Value newValue(rj::kObjectType);
for (rj::Value::MemberIterator vMIt = value.MemberBegin();
vMIt != value.MemberEnd(); vMIt++) {
rj::Value::MemberIterator nVFMIt = newValue.FindMember(vMIt->name);
if (nVFMIt == newValue.MemberEnd()) {
rj::Value newKey(vMIt->name, this->GetAllocator());
newValue.AddMember(newKey, vMIt->value, this->GetAllocator());
} else {
std::string conflict(vMIt->name.GetString(),
vMIt->name.GetStringLength());
if (std::find(resolvedConflicts.begin(),
resolvedConflicts.end(), conflict)
== resolvedConflicts.end()) {
rj::Value newArray(rj::kArrayType);
nVFMIt->value.Swap(newArray);
nVFMIt->value.PushBack(newArray, this->GetAllocator());
nVFMIt->value.PushBack(vMIt->value, this->GetAllocator());
resolvedConflicts.push_back(conflict);
} else {
nVFMIt->value.PushBack(vMIt->value, this->GetAllocator());
}
}
}
value.SetNull().SetObject();
for (rj::Value::MemberIterator nVMIt = newValue.MemberBegin();
nVMIt != newValue.MemberEnd(); nVMIt++) {
if (nVMIt->value.IsObject() || nVMIt->value.IsArray()) {
this->resolveKeyConflicts(nVMIt->value);
}
value.AddMember(nVMIt->name, nVMIt->value, this->GetAllocator());
}
} else if (value.IsArray()) {
for (rj::Value::ValueIterator vVIt = value.Begin(); vVIt != value.End();
vVIt++) {
if (vVIt->IsObject() || vVIt->IsArray()) {
this->resolveKeyConflicts(*vVIt);
}
}
}
}
I'm not so sure about the value.SetNull().SetObject() part for emptying value, but it works.
If you think there is room for improvement, just let me know where. Thanks.

Related

Is there a simple way of refactoring this code?

I have a function that have very similar repeating code. I like to refactor it, but don't want any complex mapping code.
The code basically filter out columns in a table. I made this example simple by having the comparison statement having a simple type, but the real comparison can be more complex.
I am hoping there may be some template or lambda technique that can do this.
vector<MyRecord*>& MyDb::Find(bool* field1, std::string * field2, int* field3)
{
std::vector<MyRecord*>::iterator iter;
filterList_.clear();
std::copy(list_.begin(), list_.end(), back_inserter(filterList_));
if (field1)
{
iter = filterList_.begin();
while (iter != filterList_.end())
{
MyRecord* rec = *iter;
if (rec->field1 != *field1)
{
filterList_.erase(iter);
continue;
}
iter++;
}
}
if (field2)
{
iter = filterList_.begin();
while (iter != filterList_.end())
{
MyRecord* rec = *iter;
if (rec->field2 != *field2)
{
filterList_.erase(iter);
continue;
}
iter++;
}
}
if (field3)
{
iter = filterList_.begin();
while (iter != filterList_.end())
{
MyRecord* rec = *iter;
if (rec->field3 != *field3)
{
filterList_.erase(iter);
continue;
}
iter++;
}
}
return filterList_;
}
Update: Just in case someone is curious, this is my final code. Thanks again everyone. A lot easy to understand and maintain.
vector<MyRecord*>& MyDb::Find(bool* field1, std::string* field2, int* field3)
{
auto compare = [&](MyRecord* rec) {
bool add = true;
if (field1 && rec->field1 != *field1) {
add = false;
}
if (field2 && rec->field2 != *field2) {
add = false;
}
if (field3 && rec->field3 != *field3) {
add = false;
}
return add;
};
filterList_.clear();
std::copy_if(list_.begin(), list_.end(), back_inserter(filterList_), compare);
return filterList_;
}
you can use std::copy_if (as you already/would do a copy anyway)
vector<MyRecord*>& MyDb::Find(bool* field1, std::string* field2, int* field3){
filterList_.clear();
std::copy_if(list_.begin(), list_.end(), back_inserter(filterList_),[&](MyRecord* rec){
// whatever condition you want.
return field3 && rec->field3 != *field3;
});
return filterList_;
}
Is there a simple way of refactoring this code?
As far as I understood your algorithm/ intention, using std::erase_if (c++20) you can replace the entire while loops as follows (Demo code):
#include <vector> // std::erase_if
std::vector<MyRecord*> // return by copy as filterList_ is local to function scope
Find(bool* field1 = nullptr, std::string* field2 = nullptr, int* field3 = nullptr)
{
std::vector<MyRecord*> filterList_{ list_ }; // copy of original
const auto erased = std::erase_if(filterList_, [=](MyRecord* record) {
return record
&& ((field1 && record->field1 != *field1)
|| (field2 && record->field2 != *field2)
|| (field3 && record->field3 != *field3));
}
);
return filterList_;
}
If no support for C++20, alternatively you can use erase–remove idiom, which is in effect happening under the hood of std::erase_if.

How to remove key from poco json while iterating it?

How do I remove key from a Poco json while iterating it? Like:
Poco::JSON::Object::Ptr poco_json;
for (auto& objs : *poco_json)
{
// do something
if (objs.first == "specific key")
poco_json->remove(key);
}
or
Poco::JSON::Object::Ptr poco_json;
for(auto it = poco_json->begin();it != poco_json->end();)
{
// do something
if (it->first == "specific key")
it = poco_json->remove(it->first);//error : poco didn't have something like this
else
++it;
}
the problem is after remove a key from the json, it will invalidate the iterators. I know that in std::map, erase return the valid iterator for next iteration, but I cant find something similar for Poco json.
std::map::erase returns iterator to next item since C++11, before c++11 you erase items in this way:
for (auto it = m.begin(); it != m.end(); ) {
if (it->first == someKey)
m.erase(it++); // use post-increment,pass copy of iterator, advance it
else
++it;
}
and you can do it in similar way while erasing key from Poco::JSON::Object. Where did you read that remove invalidates iterators?
Some snippet code from source:
class JSON_API Object {
typedef std::map<std::string, Dynamic::Var> ValueMap; // <--- map
//...
Iterator begin();
/// Returns begin iterator for values.
Iterator end();
/// Returns end iterator for values.
void remove(const std::string& key);
/// Removes the property with the given key.
ValueMap _values; // <---
};
inline Object::Iterator Object::begin()
{
return _values.begin();
}
inline Object::Iterator Object::end()
{
return _values.end();
}
inline void Object::remove(const std::string& key)
{
_values.erase(key); // <--- erase is called on map, so iteratos are not invalidated
if (_preserveInsOrder)
{
KeyList::iterator it = _keys.begin();
KeyList::iterator end = _keys.end();
for (; it != end; ++it)
{
if (key == (*it)->first)
{
_keys.erase(it);
break;
}
}
}
_modified = true;
}
You could rewrite your loop into:
for(auto it = poco_json->begin();it != poco_json->end();)
{
// do something
if (it->first == "specific key")
{
auto copyIt = it++;
poco_json->remove(copyIt->first);
}
else
++it;
}
EDIT
Why your code doesn't work on range-for loop:
for (auto& objs : *poco_json)
{
// do something
if (objs.first == "specific key")
poco_json->remove(key);
}
it is translated into
for (auto it = poco_json->begin(); it != poco_json->end(); ++it)
{
// do something
if (it->first == "specific key")
poco_json->remove(it->first);
// remove is called, it is erased from inner map
// ++it is called on iterator which was invalidated,
// code crashes
}
You can modify this code in Poco:
inline Iterator Object::remove(const std::string& key)
{
auto ret_it = _values.erase(key);
if (_preserveInsOrder)
{
KeyList::iterator it = _keys.begin();
KeyList::iterator end = _keys.end();
for (; it != end; ++it)
{
if (key == (*it)->first)
{
_keys.erase(it);
break;
}
}
}
_modified = true;
return ret_it;
}

Json string Comparision

I am working on a c++ project where I need to compare two or more Json string that will be passed to me as arguments in a function and i have to return a bool accordingly. I am using Jsoncpp but I am unable to compare the entirety of the two Json datas. I want to know the best procedure to loop in the key and value and check the value with corresponding value of another json string (both String will be passed to the function and will be parsed using reader.parse() of jsoncpp and then i need to compare them both and return the bool value). Can anyone help me with this please? thank you in advance.
The place where I am stuck:
class test {
public:
static bool isequalstring(const std::string &item1, const std::string
&item2, const std::string &temp) {
Document d1;
d1.Parse(item1.c_str());
Document d2;
d2.Parse(item2.c_str());
Document d3;
d3.Parse(temp.c_str());
bool matched = true;
//itr= iterate through the third json to get the keys and match the keys in first and second
for (auto itr = d3.MemberBegin(); itr != d3.MemberEnd(); itr++) {
if (d1.HasMember(itr->name) && d2.HasMember(itr->name)) { // if the member doesn't exist in both, break
if (d1[itr->name] != d2[itr->name]) {
// value doesn't match, then break
matched = false;
break;
}
} else {
matched = false;
break;
}
}
return matched;
}
};
bool testDeepNestedJson_should_succeed(){
bool expectedTestResult = true;
bool testResult;
// Input 1 JSON Object
const char* input1 = "{\"array\":[1,2,3],\"boolean\":true,\"null\":null,\"number\":123,\"object\":{\"a\":\"b\",\"c\":\"d\",\"e\":\"f\"},\"string\":\"Hello World\",\"object_array\":[{\"key\":\"value1\"},{\"key\":\"value2\"},{\"key\":\"value3\"}],\"deep_nested_array\":[{\"object_array\":[{\"key\":\"value1\"},{\"key\":\"value2\"},{\"key\":\"value3\"}]},{\"object_array\":[{\"key\":\"value4\"},{\"key\":\"value5\"},{\"key\":\"value6\"}]}]}";
const char* input2 = "{\"array\":[1,2,3],\"justsomedata\":true,\"boolean\":true,\"null\":null,\"object\":{\"a\":\"b\",\"c\":\"d\",\"e\":\"f\"},\"number\":123,\"object_array\":[{\"key\":\"value1\"},{\"key\":\"value2\"},{\"key\":\"value3\"}],\"deep_nested_array\":[{\"object_array\":[{\"key\":\"value1\"},{\"key\":\"value2\"},{\"key\":\"value3\"}]},{\"object_array\":[{\"key\":\"value4\"},{\"key\":\"value5\"},{\"key\":\"value6\",\"ignoreme\":12346}]}],\"string\":\"Hello World\"}";
const char* stencil = "{\"array\":[null],\"boolean\":null,\"null\":null,\"object\":{\"a\":null,\"c\":null,\"e\":null},\"number\":null,\"object_array\":[{\"key\":null}],\"deep_nested_array\":[{\"object_array\":[{\"key\":null}]}],\"string\":null}";
testResult = test::isequalstring(input1, input2, stencil);
if(testResult != expectedTestResult){
std::cout<<"testDeepNestedJson_should_succeed:"<<std::endl;
std::cout<<"Item1:"<<input1<<std::endl;
std::cout<<"Item2:"<<input2<<std::endl;
std::cout<<"Stencil:"<<stencil<<std::endl;
std::cout<<"Test Failed result is: False expected was: True"<<std::endl;
return false;
}
std::cout<<"PASSED: testDeepNestedJson_should_succeed"<<std::endl;
return true;
}
int main() {
testDeepNestedJson_should_succeed();
return 0;
}
Using RapidJSON, the code would be something like this
#include "rapidjson/document.h"
#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include <iostream>
using namespace rapidjson;
//({ id: 1, name : "test", randomNo : 1 }, { id: 1, name : "test", randomNo : 1 }, { id: null, name : null, randomNo : null }) //shoult assert true
//isEqualItem({id: 1, name: "test", randomNo: 1}, {id: 1, name: "test", randomNo: 2}, {id: null, name: null, randomNo: null}) //shoult assert false
//isEqualItem({id: 1, name: "test", randomNo: 1}, {id: 1, name: "test", randomNo: 3}, {id: null, name: null}) //shoult assert true
bool is_same(const std::string& s1, const std::string& s2, const std::string& s3) {
Document d1;
d1.Parse(s1.c_str());
Document d2;
d2.Parse(s2.c_str());
Document d3;
d3.Parse(s3.c_str());
bool matched = true;
// iterate through the third json to get the keys and match the keys in first and second
for (Value::ConstMemberIterator itr = d3.MemberBegin(); itr != d3.MemberEnd(); itr++) {
if (d1.HasMember(itr->name) && d2.HasMember(itr->name)) { // if the member doesn't exist in both, break
if (d1[itr->name] != d2[itr -> name]) { // value doesn't match, then break
matched = false;
break;
}
}
else {
matched = false;
break;
}
}
return matched;
}
int main() {
// 1. Parse a JSON string into DOM.
const char* json = "{\"id\":1,\"name\":\"test\",\"randomNo\":1}";
const char* json2 = "{\"id\":1,\"name\":\"test\",\"randomNo\":2}";
const char* keys = "{\"id\":\"null\",\"name\":\"null\"}";
if (is_same(json, json2,keys)) {
std::cout << "Both are same" << std::endl;
}
return 0;
}
You could iterate over root2, get name of keys using name(), access values with that names in root and root1 using operator[], and compare them using operator==:
for (auto it = root2.begin(); it != root2.end(); ++it) {
auto name = it.name();
if (root[name] != root1[name])
return false;
}
return true;
BTW. You parse item1 to root, item2 to root1 and temp to root2. You could be more consistent in naming things.

Modification to STL List Contents in C++

In the below Code-snippet, I am trying to manipulate the contents of each of the lists present in the MAP mp but by returning a pointer to list corresponding map's key whose list needs modification. I am aware that a direct modification of the list contents is possible instead of calling getlist and then modifiying it, but I am new to STL and C++ and trying to learn STL by playing around a bit with Iterators and Lists.
When the below code is executed, a Segmentation fault is thrown at the line "(*lit) = 10". Can anyone help me understand what's going wrong here?
static void getlist(int num,map<int,list<int>> mp, list<int>** l_ptr )
{
map<int,list<int>>::iterator it = mp.begin();
while( it != mp.end())
{
if(it->first == num )
{
*l_ptr = &(it->second);
return;
}
it++;
}
}
int main()
{
map<int,list<int>> mp;
mp[1] = {2,2,2};
mp[2] = {3,3,3};
mp[3] = {4,4,4};
map<int,list<int>>::iterator it = mp.begin();
list<int>::iterator lit;
list<int>* r_l = new list<int>;
//getlist(it->first,mp,r_l);
while( it != mp.end())
{
getlist(it->first,mp,&r_l);
lit = r_l->begin();
while(lit != r_l->end())
{
(*lit) = 10;
lit++;
}
it++;
}
it = mp.begin();
while( it != mp.end())
{
lit = (it->second).begin();
while(lit != (it->second).end())
{
cout<<(*lit);
lit++;
}
it++;
}
return 0;
}

Parsing object inside array in rapidjson

I'm having problems implementing a recursive function that goes over the tree I get from the parsing of a json input.
json input. e.g.:
{
"attr" : { "a": 1, "ovec": [ { "b": 2, "c": 3 }, { "d": 4} ] }
}
This is what we call a 'compound value of an attribute', and the value is simply a JSON doc. Its content is completely arbitrary (as long as its valid JSON).
The problem is that with a Vector I have to loop using the type Value::ConstValueIterator (unlike for Object, where I use Value::ConstMemberIterator).
My recursive function has Value::ConstMemberIterator as parameter and all is OK until I encounter a Vector/Object inside a Vector - for the recursive call I'd need an iterator of the type Value::ConstMemberIterator.
Relevant parts of the "traversing" function:
int parseContextAttributeCompoundValue
(
const Value::ConstMemberIterator& node
)
{
std::string type = jsonParseTypeNames[node->value.GetType()];
if (type == "Array")
{
for (Value::ConstValueIterator iter = node->value.Begin(); iter != node->value.End(); ++iter)
{
std::string nodeType = jsonParseTypeNames[iter->value.GetType()];
if (nodeType == "String")
{
val = iter->GetString();
}
// else if ...
if ((nodeType == "Object") || (nodeType == "Array"))
{
// Here's my problem - need to convert 'iter' to Value::ConstMemberIterator
// in order to recursively call parseContextAttributeCompoundValue for this object/array
parseContextAttributeCompoundValue(iter); // COMPILATION ERROR
}
}
}
else if (type == "Object")
{
for (Value::ConstMemberIterator iter = node->value.MemberBegin(); iter != node->value.MemberEnd(); ++iter)
{
std::string nodeType = jsonParseTypeNames[iter->value.GetType()];
if (nodeType == "String")
{
val = iter->value.GetString();
}
else if (nodeType == "Number")
{
if ((nodeType == "Object") || (nodeType == "Array"))
{
// Here I'm just fine as iter is of the desired type already
parseContextAttributeCompoundValue(iter);
}
}
}
}
I've tried a few things like calling iter->value.MemberBegin() to "convert" to the desired type, but so far without any success
More than thankful for some help here ...
You can simply call a function with a Value type, instead of passing iterator:
void parseContextAttributeCompoundValue(const Value& v) {
if (v.IsObject()) {
// ...
}
else if (v.IsArray() {
// ...
}
}
And then from the calling site:
for (Value::ConstValueIterator iter = ...) {
parseContextAttributeCompoundValue(*iter);
}
for (Value::ConstMemberIterator iter = ...) {
parseContextAttributeCompoundValue(iter->value);
}