rapidjson - recursively change key value with nested field - c++

I have a Json record with nested object and object arrays, the keys in those field contain spaces, I want to change all spaces to _, so I have to iterate all keys in the json object.
My idea is to write a depth first search to iterate all nested keys using ConstMemberIterator, my question is how can I change the key by given its iterator?
The example below represents my idea:
void DFSReplaceSpaceInKeys(Value::ConstMemberIterator& itr) {
// Replace space in nested key
std::string s(itr->name.GetString());
std::replace(s.begin(), s.end(), ' ', '_');
// set new string to key
itr->name.SetString(s, ?.GetAllocator()); // <----- How can I get the document allocator?
std::cout << "new key: " << itr->name.GetString() << std::endl;
// recursive call in DFS
if (itr->value.GetType() == Type::kObjectType) {
DFSReplaceSpaceInKeys(itr->value.GetObject().MemberBegin());
}
}
A Json record example:
{
"a": {"b": [{"c": [...]}, {...}]
}

You can pass an allocator as parameter. I also think you should better pass Value& to represent a node.
void DFSReplaceSpaceInKeys(Value& value, Value::AllocatorType& allocator) {
if (value.IsObject()) {
for (Value::ConstMemberIterator itr = value.MemberBegin(); itr != MemberEnd(); ++itr)
{
// Modify itr->name here...
DFSReplaceSpaceInKeys(itr->value, allocator);
}
}
else if (value.IsArray()) {
for (Value::ConstIterator itr = value.Begin(); itr != value.End(); ++itr)
DFSReplaceSpaceInKeys(*itr, allocator);
}
}
// ...
Document d;
DFSReplaceSpaceInKeys(d, d.GetAllocator());
If you only need to do the task as mentioned, you may just use the SAX API, which can be easier and faster. Check capitalize example.

rapidjson::Document::AllocatorType& allocator = doc.GetAllocator();
auto news_obj= news_info["news_feature"].GetObject();
auto title_keyword = news_obj.FindMember ("title_keyword");
if (title_keyword != news_obj.MemberEnd()) {
title_keyword->name.SetString ("title_keywords", allocator);
}

Related

STL map.find returns all the elements

Having a problem with undefined behaviour of STL map defined as follows:
typedef bool (*SNAPSHOT_CALLBACK)(/*Some params here..*/);
typedef std::map<DWORD, SNAPSHOT_CALLBACK> SnapshotsMap;
SnapshotsMap m_mapCallbacks;
insertion:
AddCallback(DWORD snapshotType, SNAPSHOT_CALLBACK callback)
m_mapCallbacks.insert(std::pair<DWORD, SNAPSHOT_CALLBACK>(snapshotType, callback));
and query:
for (auto itr = m_mapCallbacks.find(cpyHeader->hdr); itr != m_mapCallbacks.end(); itr++)
{
itr->second();
}
The problem that I'm having is on a single key search the iterator retrieves both keys that I have inserted.
My logs:
Insert:
Added callback type: 21000b Callback: 615F5AE0
Added callback type: 210136 Callback: 615F5480
Query:
Same iterator loop:
Key to find: 21000b -> FOUND First: 21000b Second: 61da5ae0
Key to find: 21000b -> FOUND First: 210136 Second: 61da5480
for some reason both elements get retrieved and there's no other modifications/thread on this map.
Some help would be much appreciated :)
Query should be
// C++17 if construct
if (auto itr = m_mapCallbacks.find(cpyHeader->hdr); itr != m_mapCallbacks.end())
{
itr->second();
}
or
// pre-C++17 (but C++11 for auto)
auto itr = m_mapCallbacks.find(cpyHeader->hdr);
if (itr != m_mapCallbacks.end())
{
itr->second();
}
Your for iterates from found key until the end (so only (potentially) skips first elements)

partial lookup in key-value map where key itself is a key-value map

Suppose we have a data structure that is a key-value map, where the key itself is again a key-value map. For example:
map<map<string,string>>, string>
Now, suppose that we want to query all top-level key/values in this map matching a certain subset of the key-values of the key. Example:
map = { { "k1" : "v1", "k2 : "v2" } : "value1",
{ "k1" : "v3", "k2 : "v4" } : "value2",
{ "k1" : "v1", "k2 : "v5" } : "value3"
}
And our query is "give me all key-values where key contains { "k1" : "v1" } and it would return the first and third value. Similarly, querying for { "k1" : "v3", "k2" : "v4" } would return all key-values that have both k1=v3 and k2=v4, yielding the second value. Obviously we could search through the full map on every query, but I'm looking for something more efficient than that.
I have looked around, but can't find an efficient, easy-to-use solution out there for C++. Boost multi_index does not seem to have this kind of flexibility in querying subsets of key-value pairs.
Some databases have ways to create indices that can answer exactly these kind of queries. For example, Postgres has GIN indices (generalized inverted indices) that allow you to ask
SELECT * FROM table WHERE some_json_column #> '{"k1":"v1","k2":"v2"}'
-- returns all rows that have both k1=v1 and k2=v2
However, I'm looking for a solution without databases just in C++. Is there any library or data structure out there that can accomplish something like this? In case there is none, some pointers on a custom implementation?
I would stay with the database index analogy. In that analogy, the indexed search does not use a generic k=v type search, but just a tuple with the values for the elements (generally columns) that constitute the index. The database then reverts to scans for the other k=v parameters that are not in the index.
In that analogy, you would have a fixed number of keys that could be represented as an array or strings (fixed size). The good news is that it is then trivial to set a global order on the keys, and thanks to the std::map::upper_bound method, it is also trivial to find an iterator immediately after a partial key.
So getting a full key is immediate: just extract it with find, at or operator []. And getting all elements for a partial key is still simple:
find an iterator starting above the partial key with upper_bound
iterate forward while the element matches the partial key
But this require that you change your initial type to std::map<std::array<string, N>, string>
You could build an API over this container using std::map<string, string> as input values, extract the actual full or partial key from that, and iterate as above, keeping only elements matching the k,v pairs not present in index.
You could use std::includes to check if key maps include another map of queried key-value pairs.
I am unsure how to avoid checking every key-map though. Maybe other answers have a better idea.
template <typename MapOfMapsIt, typename QueryMapIt>
std::vector<MapOfMapsIt> query_keymap_contains(
MapOfMapsIt mom_fst,
MapOfMapsIt mom_lst,
QueryMapIt q_fst,
QueryMapIt q_lst)
{
std::vector<MapOfMapsIt> out;
for(; mom_fst != mom_lst; ++mom_fst)
{
const auto key_map = mom_fst->first;
if(std::includes(key_map.begin(), key_map.end(), q_fst, q_lst))
out.push_back(mom_fst);
}
return out;
}
Usage:
typedef std::map<std::string, std::string> StrMap;
typedef std::map<StrMap, std::string> MapKeyMaps;
MapKeyMaps m = {{{{"k1", "v1"}, {"k2", "v2"}}, "value1"},
{{{"k1", "v3"}, {"k2", "v4"}}, "value2"},
{{{"k1", "v1"}, {"k2", "v5"}}, "value3"}};
StrMap q1 = {{"k1", "v1"}};
StrMap q2 = {{"k1", "v3"}, {"k2", "v4"}};
auto res1 = query_keymap_contains(m.begin(), m.end(), q1.begin(), q1.end());
auto res2 = query_keymap_contains(m.begin(), m.end(), q2.begin(), q2.end());
std::cout << "Query1: ";
for(auto i : res1) std::cout << i->second << " ";
std::cout << "\nQuery2: ";
for(auto i : res2) std::cout << i->second << " ";
Output:
Query1: value1 value3
Query2: value2
Live Example
I believe the efficiency of different methods will depend on actual data. However, I would consider making a "cache" of iterators to outer map elements for particular "kX","vY" pairs as follows:
using M = std::map<std::map<std::string, std::string>, std::string>;
M m = {
{ { { "k1", "v1" }, { "k2", "v2" } }, "value1" },
{ { { "k1", "v3" }, { "k2", "v4" } }, "value2" },
{ { { "k1", "v1" }, { "k2", "v5" } }, "value3" }
};
std::map<M::key_type::value_type, std::vector<M::iterator>> cache;
for (auto it = m.begin(); it != m.end(); ++it)
for (const auto& kv : it->first)
cache[kv].push_back(it);
Now, you basically need to take all searched "kX","vY" pairs and find the intersection of cached iterators for them:
std::vector<M::key_type::value_type> find_list = { { "k1", "v1" }, { "k2", "v5" } };
std::vector<M::iterator> found;
if (find_list.size() > 0) {
auto it = find_list.begin();
std::copy(cache[*it].begin(), cache[*it].end(), std::back_inserter(found));
while (++it != find_list.end()) {
const auto& temp = cache[*it];
found.erase(std::remove_if(found.begin(), found.end(),
[&temp](const auto& e){ return std::find(temp.begin(), temp.end(), e) == temp.end(); } ),
found.end());
}
}
The final output:
for (const auto& it : found)
std::cout << it->second << std::endl;
gives value3 in this case.
A live demo: https://wandbox.org/permlink/S9Zp8yofSvjfLokc.
Note that the complexity of the intersection step is quite large, since cached iterators are unsorted. If you use pointers instead, you can sort the vectors or store the pointers in a map instead, which would allow you to find intersections much faster, e.g., by using std::set_intersection.
You can do it with as single (partial) pass through each element with an ordered query, returning early as much as possible. Taking inspiration from std::set_difference, we want to know if query is a subset of data, which lets us select entries of the outer map.
// Is the sorted range [first1, last1) a subset of the sorted range [first2, last2)
template<class InputIt1, class InputIt2>
bool is_subset(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2)
{
while (first1 != last1) {
if (first2 == last2) return false; // Reached the end of data with query still remaing
if (*first1 < *first2) {
return false; // didn't find this query element
} else {
if (! (*first2 < *first1)) {
++first1; // found this query element
}
++first2;
}
}
return true; // reached the end of query
}
// find every element of "map-of-maps" [first2, last2) for which the sorted range [first1, last1) is a subset of it's key
template<class InputIt1, class InputIt2, class OutputIt>
OutputIt query_data(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, OutputIt d_first)
{
auto item_matches = [=](auto & inner){ return is_subset(first1, last1, inner.first.begin(), inner.first.end()); };
return std::copy_if(first2, last2, d_first, item_matches);
}
std::map is implemented as a balanced binary tree which has O(nlgn) look-up. What you need instead, is std::unordered_map which is implemented as a hash-table, that is O(1) look-ups.
Now let me rephrase your wording, you want to:
And our query is "give me all key-values where key contains { "k1" : "v1" } and it would return the first and third value.
Which translates to:
If the key-value pair given is in the inner map, give me back its value.
Essentially what you need is a double look-up which std::unordered_map excel at.
Here is a code spinet that solves your problem with the standard library (no fancy code required)
#include <iostream>
#include <unordered_map>
#include <string>
int main() {
using elemType = std::pair<std::string, std::string>;
using innerMap = std::unordered_map<std::string, std::string>;
using myMap = std::unordered_map<std::string, innerMap>;
auto table = myMap{ { "value1", { {"k1", "v1"}, {"k2", "v2"} } },
{ "value2", { {"k1", "v3"}, {"k2", "v4"} } },
{ "value3", { {"k1", "v1"}, {"k2", "v5"} } } };
//First we set-up a predicate lambda
auto printIfKeyValueFound = [](const myMap& tab, const elemType& query) {
// O(n) for the first table and O(1) lookup for each, O(n) total
for(const auto& el : tab) {
auto it = el.second.find(query.first);
if(it != el.second.end()) {
if(it->second == query.second) {
std::cout << "Element found: " << el.first << "\n";
}
}
}
};
auto query = elemType{"k1", "v1"};
printIfKeyValueFound(table, query);
Output: Value3, Value1
For queries of arbitrary size you can:
//First we set-up a predicate lambda
auto printIfKeyValueFound = [](const myMap& tab, const std::vector<elemType>& query) {
// O(n) for the first table and O(n) for the query O(1) search
// O(n^2) total
for(const auto& el : tab) {
bool found = true;
for(const auto& queryEl : query) {
auto it = el.second.find(queryEl.first);
if(it != el.second.end() && it->second != queryEl.second) {
found = false;
break;
}
}
if(found)
std::cout << el.first << "\n";
}
};
auto query = std::vector<elemType>{ {"k1", "v1"}, {"k2", "v2"} };
output Value1

Iterate through a vector of objects and find a variable that matches one pulled from a text file

So I have a vector of objects
vector<Module*> moduleVector;
and I need to iterate through it and compare an attribute from the object to another attribute I'm pulling from a text file
I'm using an ifstream and getLine() to store the element that needs to be compared to the object's attribute (fileD is the opened file, markModId is the string variable)
getline(fileD, markModId, ' ');
But I am unsure of how I can refer to the object's attributes in an iterator. So my question is,
how do I compare the attribute from the file to the object using an iterator?
For reference here is my object constructor (id is the attribute I want to compare)
Module::Module(string id, string title, string lecturer, int
courseworkWeight)
{
code = id;
name = title;
lect = lecturer;
cwWeight = courseworkWeight;
exMark = 0; //ex mark initialised as 0
/*
Map to store coursework marks
*/
map<string, float> CWmarks;
//cwMarks.clear(); //cw marks map cleared
//create a map that stores
}
And exMark is the attribute that needs to be added to the object. All attributes in the Module constructor are private.
How do I compare the attribute from the file to the object using an
iterator?
Short answer: Suppose you have an iterator std::vector<Module*>::iterator iter you can access the public members of Module class like:
(*iter)->/*public member*/;
Long answer: First of all, you need a getter for private member id and one setter for exMark, by which you can get the id of each Module and compare to the id from the file and then set its exMark to some value.
std::string getId()const { return code; }
void setExMark(const double newMark) { exMark = newMark; }
If you want to change the first true instance of Module, you can use std::find_if for finding the Module:
std::string idFromFile = "two";
auto Condition = [&idFromFile](Module* element){ return element->getId() == idFromFile; };
auto iter = std::find_if(moduleVector.begin(), moduleVector.end(), Condition);
if(iter != moduleVector.end())
(*iter)->setExMark(10.0); // see this
// ^^^^^^^^^
See a sample code here
For multiple instances you can do:
for(auto iter = moduleVector.begin(); iter != moduleVector.end(); ++iter)
if ( (*iter)->getId() == idFromFile)
(*iter)->setExMark(10.0);
Note: In modern C++ you can use smart pointers, instead of raw pointers, which will delete the objects automatically as it goes out of scope.
Simply dereference the iterator to access its Module* pointer, then you can access the object using operator-> however you want, eg:
for (std::vector<Module*>::iterator iter = moduleVector.begin(), end = moduleVector.end(); iter != end; ++iter)
{
Module *m = *iter;
if (m->code == markModId)
m->exMark = ...;
}
Or, if you are using C++11 or later, let the compiler handle the iterator for you:
for (Module *m : moduleVector)
{
if (m->code == markModId)
m->exMark = ...;
}
Or, use a lambda with one of the standard iteration algorithms, eg:
std::for_each(moduleVector.begin(), moduleVector.end(),
[&](Module *m)
{
if (m->code == markModId)
m->exMark = ...;
}
);
If you are only interested in updating 1 Module, then break the loop when the the desired Module is found:
for (std::vector<Module*>::iterator iter = moduleVector.begin(), end = moduleVector.end(); iter != end; ++iter)
{
Module *m = *iter;
if (m->code == markModId)
{
m->exMark = ...;
break; // <-- add this
}
}
for (Module *m : moduleVector)
{
if (m->code == markModId)
{
m->exMark = ...;
break; // <-- add this
}
}
auto iter = std::find_if(moduleVector.begin(), moduleVector.end(),
[&](Module *m) { return (m->code == markModId); });
if (iter != moduleVector.end())
{
Module *m = *iter;
m->exMark = ...;
}

RapidJSON get member name of Value

Wondering if it's possible to extract the name of a rapidjson::Value directly from it.
For instance, assume we have the following JSON data:
{
"name":
[
{ /*some data*/ },
{ /*some more data*/ }
]
}
And I retrieve the "name" array from it:
rapidjson::Value& myJSONArray = document["name"];
Can I retrieve "name" back from that Value? Something like this:
std::string memberName = myJSONArray.GetMemberName(); // returns "name"
No. It is not possible because an array may not be within an object.
You may use iterator.
Value::MemberIterator itr = document.FindMember("name");
string n = itr->name.GetString();
Value& v = itr->value;
Iterators for object has name and value properties
std::pair<bool, std::string> iterate_items()
{
constexpr std::string_view stringJson = R"([ {"k1": "v1"}, {"k2": "v2"}, {"k3": "v3"}, {"k4": "v4"} ])";
// Wrap input stream for rapidjson reading
rapidjson::MemoryStream memorystreamFile( stringJson.data(), stringJson.length() );
rapidjson::Document documentJson; // Create root rapidjson object
documentJson.ParseStream( memorystreamFile ); // Parse json file
if( documentJson.IsArray() == true ) // Yes, we know it is an array :)
{
for( auto const& it : documentJson.GetArray() ) // iterate array
{
if( it.IsObject() == true ) // They are all objects
{
auto const& _name = it.MemberBegin()->name; // get name
auto const& _value = it.MemberBegin()->value; // get value
std::cout << _name.GetString() << _value.GetString() << "\n"; // dump it
}
}
}
return std::pair<bool, std::string>( true, std::string() );
}
Tutorial with RapidJSON

Yaml-cpp (new API): Problems mixing maps and scalars in a sequence

I have a very simple problem parsing a yaml file of this form:
- Foo
- Bar:
b1: 5
I would like to parse the top level keys as strings namely "Foo" and "Bar".
As you can see the first entry in the sequence is a scalar and the second is a map containing one key/value pair. Let's say I've loaded this YAML text into a node called config. I iterate over config in the following way:
YAML::Node::const_iterator n_it = config.begin();
for (; n_it != config.end(); n_it++) {
std::string name;
if (n_it->Type() == YAML::NodeType::Scalar)
name = n_it->as<std::string>();
else if (n_it->Type() == YAML::NodeType::Map) {
name = n_it->first.as<std::string>();
}
}
The problem is parsing the second "Bar" entry. I get the following yaml-cpp exception telling me I'm trying to access the key from a sequence iterator n_it.
YAML::InvalidNode: yaml-cpp: error at line 0, column 0: invalid node; this may result from using a map iterator as a sequence iterator, or vice-versa
If I change the access to this:
name = n_it->as<std::string>();
I get a different yaml-cpp exception which I guess is due to the fact that I'm trying to access the whole map as a std::string
YAML::TypedBadConversion<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >: yaml-cpp: error at line 0, column 0: bad conversion
Can somebody please explain to me what's going wrong?
Edit: new problems
I'm still having problems with this api's handling of maps vs sequences. Now say I have the following structure:
foo_map["f1"] = "one";
foo_map["f2"] = "two";
bar_map["b1"] = "one";
bar_map["b2"] = "two";
I want this to be converted to the following YAML file:
Node:
- Foo:
f1 : one
f2 : two
- Bar:
b1 : one
b2 : two
I would do so by doing:
node.push_back("Foo");
node["Foo"]["b1"] = "one";
...
node.push_back("Bar");
However at the last line node has now been converted from a sequence to a map and I get an exception. The only way I can do this is by outputting a map of maps:
Node:
Foo:
f1 : one
f2 : two
Bar:
b1 : one
b2 : two
The problem with this is if I cannot read back such files. If I iterate over Node, I'm unable to even get the type of the node iterator without getting an exception.
YAML::Node::const_iterator n_it = node.begin();
for (; n_it != config.end(); n_it++) {
if (n_it->Type() == YAML::NodeType::Scalar) {
// throws exception
}
}
This should be very simple to handle but has been driving me crazy!
In your expression
name = n_it->first.as<std::string>();
n_it is a sequence iterator (since it's an iterator for your top-level node), which you've just established points to a map. That is,
YAML::Node n = *n_it;
is a map node. This map node (in your example) looks like:
Bar:
b1: 5
In other words, it has a single key/value pair, with the key a string, and the value a map node. It sounds like you want the string key. So:
assert(n.size() == 1); // Verify that there is, in fact, only one key/value pair
YAML::Node::const_iterator sub_it = n.begin(); // This iterator points to
// the single key/value pair
name = sub_it->first.as<std::string>();
Sample.yaml
config:
key1: "SCALER_VAL" # SCALER ITEM
key2: ["val1", "val2"] #SEQUENCE ITEM
key3: # MAP ITEM
nested_key1: "nested_val"
#SAMPLE CODE for Iterate Yaml Node;
YAML::Node internalconfig_yaml = YAML::LoadFile(configFileName);
const YAML::Node &node = internalconfig_yaml["config"];
for(const auto& it : node )
{
std::cout << "\nnested Key: " << it.first.as<std::string>() << "\n";
if (it.second.Type() == YAML::NodeType::Scalar)
{
std::cout << "\nnested value: " << std::to_string(it.second.as<int>()) << "\n";
}
if (it.second.Type() == YAML::NodeType::Sequence)
{
std::vector<std::string> temp_vect;
const YAML::Node &nestd_node2 = it.second;
for(const auto& it2 : nestd_node2)
{
if (*it2)
{
std::cout << "\nnested sequence value: " << it2.as<std::string>() << "\n";
temp_vect.push_back(it2.as<std::string>());
}
}
std::ostringstream oss;
std::copy(temp_vect.begin(), temp_vect.end(),
std::ostream_iterator<std::string>(oss, ","));
std::cout << "\nnested sequence as string: " <<oss.str() << "\n";
}
if (it2.second.Type() == YAML::NodeType::Map)
{
// Iterate Recursively again !!
}
}
Refer here for more details;
This can also be done with the new C++ loop:
std::string name;
for (const auto &entry: node_x) {
assert(name.empty());
name = entry.first.as<std::string>();
}
The assertion will trigger if the node_x is something else than you think. It should be only one entry in this map.
Try something like this:
- Foo: {}
- Bar:
b1: 15