Map node names using pugixml for different inputs

Map node names using pugixml for different inputs - c++

Problem
My program spits out XML nodes from a file using pugixml. This is the bit of the code which does this:
for (auto& ea: mapa) {
std::cout << "Removed:" << std::endl;
ea.second.print(std::cout);
}
for (auto& eb: mapb) {
std::cout << "Added:" << std::endl;
eb.second.print(std::cout);
}
All nodes spat out should have this format (for example filea.xml):
<entry>
<id><![CDATA[9]]></id>
<description><![CDATA[Dolce 27 Speed]]></description>
</entry>
However what is spat out depends on how the input data is formatted. Sometimes the tags are called different things and I could end up with this (for example fileb.xml):
<entry>
<id><![CDATA[9]]></id>
<mycontent><![CDATA[Dolce 27 Speed]]></mycontent>
</entry>
Possible solution
Is it possible to define non standard mappings (names of nodes) so that, no matter what the names of the nodes are on the input file, I always std:cout it in the same format (id and description)
It seems like the answer is based on this code:
description = mycontent; // Define any non-standard maps
std::cout << node.set_name("notnode");
std::cout << ", new node name: " << node.name() << std::endl;
I'm new to C++ so any suggestions on how to implement this would be appreciated. I have to run this on tens of thousands of fields so performance is key.
Reference
https://pugixml.googlecode.com/svn/tags/latest/docs/manual/modify.html
https://pugixml.googlecode.com/svn/tags/latest/docs/samples/modify_base.cpp

Maybe something like this will is what you're looking for?
#include <map>
#include <string>
#include <iostream>
#include "pugixml.hpp"
using namespace pugi;
int main()
{
// tag mappings
const std::map<std::string, std::string> tagmaps
{
{"odd-id-tag1", "id"}
, {"odd-id-tag2", "id"}
, {"odd-desc-tag1", "description"}
, {"odd-desc-tag2", "description"}
};
// working registers
std::map<std::string, std::string>::const_iterator found;
// loop through the nodes n here
for(auto&& n: nodes)
{
// change node name if mapping found
if((found = tagmaps.find(n.name())) != tagmaps.end())
n.set_name(found->second.c_str());
}
}

Related

Why can I access C++ map elements with a for loop but not individually?

I'm having a bit of a problem with the code below.
#include <iostream>
#include <fstream>
#include <string>
#include <map>
std::map<std::string, int> m; //Dictionary map
int main() {
std::ifstream dictionaryFile("dictionary.txt");
std::string str;
int probability = -1;
//Read dictionary.txt and assign to map
while(std::getline(dictionaryFile, str)) {
if(str.find("#!comment:") == std::string::npos) { //Not a comment
m.insert(std::pair<std::string, int>(str, probability));
}
else {
probability++;
}
}
dictionaryFile.close();
//Iterate and print through map -- THIS WORKS
std::map<std::string, int>::iterator pos;
for(pos = m.begin(); pos != m.end(); ++pos) {
std::cout << "Key: " << pos->first << std::endl;
std::cout << "Value: " << pos->second << "\n" << std::endl;
}
//Is "very" in the map? -- THIS DOES NOT WORK
std::cout << m.find("very")->second << std::endl;
if(m.find("very") != m.end()) {
std::cout << "found it" << std::endl;
} else {
std::cout << "did not find it" << std::endl;
}
}
I read in the "dictionary.txt" file, and insert each word into a map. Either 1 or 2 is the value associated with that key, depending on the probability of the word.
I'm able to iterate through the map and print it's elements from within a for-loop, as shown. But I'm unable to access each element individually with m.find(), m.count(), or the [] operator. Each of those show as if the map is empty.
Do I have a piece of syntax wrong? Have I discovered a bug in std::map? Any help would be appreciated!
Here is dictionary.txt if you would like it.

Your file contains Windows CRLF line endings \r\n. These are automatically translated into \n with the default istream processing on Windows. However, you are on a Linux system that will be treating your \r character as nothing particularly special.
There are various ways around this. The simplest would be to not use such files as inputs on Linux. You can find answers elsewhere on this site for how to convert line-endings in the shell.
If you absolutely want your program to handle them, then you need to introduce some extra code. It can be as simple as checking the last character:
if (!str.empty() && str.back() == '\r')
str.pop_back();
For pre-C++11 standard library that doesn't have std::string::pop_back, you can just call str.erase(str.size()-1) instead.

How to iterate through a specific key in a map containing vector as value?

How to iterate through the contents of map["a"] to retrieve call and call1 ?
std::vector<std::string> point
std::map<std::string, point> alloc
map["a"] = call, call1
map["i"] = call
I have tried using for loop using map iterator and inside that for loop another for loop on the vector and then checking whether the value of map iterator map equals "a" but keep getting an error.

I think you are misunderstanding some syntax and of the programming language and the semantics of the standard library containers a little bit. I will explain what I think you are doing wrong.
First thing is that you have a vector of string objects called point, this is an object not a type. An object is a variable of a type, for example
string name = "curious";
Here name is an object of type/class string, so you cannot type in point as the template parameter to the map, you have to type in a type. So that should be a string.
Second thing is that you are using the comma operator, I am not sure if you knew that you were doing that. The comma operator works as follows
#include <iostream>
using std::cout;
using std::endl;
#include <string>
using std::string;
int main() {
cout << ("Hello", "World") << endl;
return 0;
}
^ this will generate a compiler error because the "Hello" is not used but the point is that the comma operator evaluates the first part of the expression and then returns the thing on the right; so this will print
World
Third thing is how you iterate through the map. When you iterate through a std::map in C++ you are actually iterating through a series of std::pairs so the following code
#include <iostream>
using std::cout;
using std::endl;
#include <string>
using std::string;
#include <map>
using std::map;
int main() {
map<string, int> map_string_int {{"curious", 1}, {"op", 2}};
for (auto iter = map_string_int.begin(); iter != map_string_int.end();
++iter) {
cout << iter->first << " : " << iter->second << endl;
}
return 0;
}
will produce the following output
curious : 1
op : 2
the keys will be ordered alphabetically because they are stored in a binary search tree (https://en.wikipedia.org/wiki/Binary_search_tree)
Now I think you wanted to have a map from string objects to vectors, so you would structure your code as such
std::vector<string> point
std::map<string, std::vector<string>> alloc;
alloc["a"] = {"call", "call1"};
alloc["i"] = {"call"};
and you would iterate through this like so
for (auto iter = alloc.begin(); iter != alloc.end(); ++iter) {
cout << iter->first << " : " << iter->second << endl;
}
You would iterate through alloc["a"] like so
// sanity check
assert(alloc.find("a") != alloc.end());
for (auto iter = alloc["a"].begin(); iter != alloc["a"].end(); ++iter) {
cout << *iter << endl;
}
Hope that helped!

I assume you mean std::multimap instead of std::map, based on your use case (multiple values under the same key). It's in the same <map> header.
std::multimap<std::string, int> map;
map.insert(std::make_pair("first", 123));
map.insert(std::make_pair("first", 456));
auto result = map.equal_range("first");
for (auto it = result.first; it != result.second; ++it)
std::cout << " " << it->second;
Reference: std::multimap::equal_range

This should do what you want if I understand correctly.
std::vector<string> point = { "Hello", "World" };
std::map<std::string, decltype(point)> my_map;
//if you dont wan't to use decltype (or cant):
//std::map<std::string, std::vector<std::string>> my_map;
my_map["A"] = point;
my_map["B"] = { "Something", "Else" };
//this will iterate only trought my_map["A"]
for (const auto &vector_val : my_map["A"])
std::cout << vector_val << std::endl;
//this will iterate trought the whole map
for (const auto &map_pair : my_map)
{
std::cout << "map: " << map_pair.first << std::endl;
for (const auto &vector_val : map_pair.second)
std::cout << vector_val << std::endl;
std::cout << "---------------------------------" << std::endl;
}

I'm curious about knowing what is more suitable in such situations i.e multimap or map_of_vectors .
If sequencially someone want to iterate vector associated to a particular/all keys in map
what will be more efficient/optimal.
map<string ,vector<string>> mp;
// initialize your map...
for(auto itr=mp.begin(); itr!=mp.end() ;itr++)
for(auto itr2=itr->second.begin(); itr2!=itr->second.end() ;itr2++)
cout<<*itr2
for particular key just change first loop as stated down
auto itr=mp.find(key);

Combinations of N Boost interval_set

I have a service which has outages in 4 different locations. I am modeling each location outages into a Boost ICL interval_set. I want to know when at least N locations have an active outage.
Therefore, following this answer, I have implemented a combination algorithm, so I can create combinations between elemenets via interval_set intersections.
Whehn this process is over, I should have a certain number of interval_set, each one of them defining the outages for N locations simultaneusly, and the final step will be joining them to get the desired full picture.
The problem is that I'm currently debugging the code, and when the time of printing each intersection arrives, the output text gets crazy (even when I'm using gdb to debug step by step), and I can't see them, resulting in a lot of CPU usage.
I guess that somehow I'm sending to output a larger portion of memory than I should, but I can't see where the problem is.
This is a SSCCE:
#include <boost/icl/interval_set.hpp>
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
// Initializing data for test
std::vector<boost::icl::interval_set<unsigned int> > outagesPerLocation;
for(unsigned int j=0; j<4; j++){
boost::icl::interval_set<unsigned int> outages;
for(unsigned int i=0; i<5; i++){
outages += boost::icl::discrete_interval<unsigned int>::closed(
(i*10), ((i*10) + 5 - j));
}
std::cout << "[Location " << (j+1) << "] " << outages << std::endl;
outagesPerLocation.push_back(outages);
}
// So now we have a vector of interval_sets, one per location. We will combine
// them so we get an interval_set defined for those periods where at least
// 2 locations have an outage (N)
unsigned int simultaneusOutagesRequired = 2; // (N)
// Create a bool vector in order to filter permutations, and only get
// the sorted permutations (which equals the combinations)
std::vector<bool> auxVector(outagesPerLocation.size());
std::fill(auxVector.begin() + simultaneusOutagesRequired, auxVector.end(), true);
// Create a vector where combinations will be stored
std::vector<boost::icl::interval_set<unsigned int> > combinations;
// Get all the combinations of N elements
unsigned int numCombinations = 0;
do{
bool firstElementSet = false;
for(unsigned int i=0; i<auxVector.size(); i++){
if(!auxVector[i]){
if(!firstElementSet){
// First location, insert to combinations vector
combinations.push_back(outagesPerLocation[i]);
firstElementSet = true;
}
else{
// Intersect with the other locations
combinations[numCombinations] -= outagesPerLocation[i];
}
}
}
numCombinations++;
std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl; // The problem appears here
}
while(std::next_permutation(auxVector.begin(), auxVector.end()));
// Get the union of the intersections and see the results
boost::icl::interval_set<unsigned int> finalOutages;
for(std::vector<boost::icl::interval_set<unsigned int> >::iterator
it = combinations.begin(); it != combinations.end(); it++){
finalOutages += *it;
}
std::cout << finalOutages << std::endl;
return 0;
}
Any help?

As I surmised, there's a "highlevel" approach here.
Boost ICL containers are more than just containers of "glorified pairs of interval starting/end points". They are designed to implement just that business of combining, searching, in a generically optimized fashion.
So you don't have to.
If you let the library do what it's supposed to do:
using TimePoint = unsigned;
using DownTimes = boost::icl::interval_set<TimePoint>;
using Interval = DownTimes::interval_type;
using Records = std::vector<DownTimes>;
Using functional domain typedefs invites a higher level approach. Now, let's ask the hypothetical "business question":
What do we actually want to do with our records of per-location downtimes?
Well, we essentially want to
tally them for all discernable time slots and
filter those where tallies are at least 2
finally, we'd like to show the "merged" time slots that remain.
Ok, engineer: implement it!
Hmm. Tallying. How hard could it be?
❕ The key to elegant solutions is the choice of the right datastructure
using Tally = unsigned; // or: bit mask representing affected locations?
using DownMap = boost::icl::interval_map<TimePoint, Tally>;
Now it's just bulk insertion:
// We will do a tally of affected locations per time slot
DownMap tallied;
for (auto& location : records)
for (auto& incident : location)
tallied.add({incident, 1u});
Ok, let's filter. We just need the predicate that works on our DownMap, right
// define threshold where at least 2 locations have an outage
auto exceeds_threshold = [](DownMap::value_type const& slot) {
return slot.second >= 2;
};
Merge the time slots!
Actually. We just create another DownTimes set, right. Just, not per location this time.
The choice of data structure wins the day again:
// just printing the union of any criticals:
DownTimes merged;
for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
merged.insert(slot);
Report!
std::cout << "Criticals: " << merged << "\n";
Note that nowhere did we come close to manipulating array indices, overlapping or non-overlapping intervals, closed or open boundaries. Or, [eeeeek!] brute force permutations of collection elements.
We just stated our goals, and let the library do the work.
Full Demo
Live On Coliru
#include <boost/icl/interval_set.hpp>
#include <boost/icl/interval_map.hpp>
#include <boost/range.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/numeric.hpp>
#include <boost/range/irange.hpp>
#include <algorithm>
#include <iostream>
#include <vector>
using TimePoint = unsigned;
using DownTimes = boost::icl::interval_set<TimePoint>;
using Interval = DownTimes::interval_type;
using Records = std::vector<DownTimes>;
using Tally = unsigned; // or: bit mask representing affected locations?
using DownMap = boost::icl::interval_map<TimePoint, Tally>;
// Just for fun, removed the explicit loops from the generation too. Obviously,
// this is bit gratuitous :)
static DownTimes generate_downtime(int j) {
return boost::accumulate(
boost::irange(0, 5),
DownTimes{},
[j](DownTimes accum, int i) { return accum + Interval::closed((i*10), ((i*10) + 5 - j)); }
);
}
int main() {
// Initializing data for test
using namespace boost::adaptors;
auto const records = boost::copy_range<Records>(boost::irange(0,4) | transformed(generate_downtime));
for (auto location : records | indexed()) {
std::cout << "Location " << (location.index()+1) << " " << location.value() << std::endl;
}
// We will do a tally of affected locations per time slot
DownMap tallied;
for (auto& location : records)
for (auto& incident : location)
tallied.add({incident, 1u});
// We will combine them so we get an interval_set defined for those periods
// where at least 2 locations have an outage
auto exceeds_threshold = [](DownMap::value_type const& slot) {
return slot.second >= 2;
};
// just printing the union of any criticals:
DownTimes merged;
for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
merged.insert(slot);
std::cout << "Criticals: " << merged << "\n";
}
Which prints
Location 1 {[0,5][10,15][20,25][30,35][40,45]}
Location 2 {[0,4][10,14][20,24][30,34][40,44]}
Location 3 {[0,3][10,13][20,23][30,33][40,43]}
Location 4 {[0,2][10,12][20,22][30,32][40,42]}
Criticals: {[0,4][10,14][20,24][30,34][40,44]}

At the end of the permutation loop, you write:
numCombinations++;
std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl; // The problem appears here
My debugger tells me that on the first iteration numCombinations was 0 before the increment. But incrementing it made it out of range for the combinations container (since that is only a single element, so having index 0).
Did you mean to increment it after the use? Was there any particular reason not to use
std::cout << "[-INTERSEC-] " << combinations.back() << "\n";
or, for c++03
std::cout << "[-INTERSEC-] " << combinations[combinations.size()-1] << "\n";
or even just:
std::cout << "[-INTERSEC-] " << combinations.at(numCombinations) << "\n";
which would have thrown std::out_of_range?
On a side note, I think Boost ICL has vastly more efficient ways to get the answer you're after. Let me think about this for a moment. Will post another answer if I see it.
UPDATE: Posted the other answer show casing highlevel coding with Boost ICL

boost recognize a child

My question is related to : boost
Some of the boost code is working correctly to find that a node has child, but if one node have two other nodes it didn't recognize the children.
It's recursive call to be able to read all the tree nodes and then apply the copy of the value to the google protocol buffer
void ReadXML(iptree& tree, string doc)
{
const GPF* gpf= pMessage->GetGPF();
for(int i = 0 ; i < gpf->field_count(); ++i)
{
string fieldName = GetName(i);
boost::optional< iptree & > chl = pt.get_child_optional(fieldName);
if(chl) {
for( auto a : *chl ){
boost::property_tree::iptree subtree = (boost::property_tree::iptree) a.second ;
assignDoc(doc);
ReadXML(subtree, doc);
}
}
}
}
the XML file
<?xml version="1.0" encoding="utf-8"?>
<nodeA xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<nodeA.1>This is the Adresse</nodeA.1>
<nodeA.2>
<node1>
<node1.1>
<node1.1.1>Female</node1.1.1>
<node1.1.2>23</node1.1.2>
<node1.1.3>Engineer</node1.1.3>
</node1.1>
<node1.1>
<node1.2.1>Female</node1.2.1>
<node1.2.2>35</node1.2.2>
<node1.2.3>Doctors</node1.2.3>
</node1.1>
</node1>
</nodeA.2>
<nodeA.3>Car 1</nodeA.3>
</nodeA>
My problem is that node1 is not recognised as having child. I don't know if it's because there are two children nodes with the same name.
Note that the XML files may change from one client to another. I may have different nodes.
Do I have to use a.second or a.first?

Here
boost::optional< iptree & > chl = pt.get_child_optional(fieldName);
you explicitly search for a child with a given name. This name never seems the change during recursion. On every level you look for children with the same name it seems.

I think you could/should be looking at this problem from a higher level.
Boost Property Tree uses RapidXML under the hood. PugiXML is a similar, but more modern library that can also be used in header-only mode. With PugiXML you could write:
pugi::xml_document doc;
doc.load(iss);
for (auto& node : doc.select_nodes("*/descendant::*[count(*)=3]/*[count(*)=0]/.."))
{
auto values = node.node().select_nodes("*/text()");
std::cout << "Gender " << values[0].node().value() << "\n";
std::cout << "Age " << values[1].node().value() << "\n";
std::cout << "Job Title " << values[2].node().value() << "\n";
}
It selects all descendants of the root node (nodeA) that three leaf child nodes, and interprets them as Gender, Age and Job Title. It prints:
Gender Female
Age 23
Job Title Engineer
Gender Female
Age 35
Job Title Doctors
I hope you will find this constructive.
Full Demo
On my system to build, simply:
sudo apt-get install libpugixml-dev
g++ -std=c++11 demo.cpp -lpugixml -o demo
./demo
demo.cpp:
#include <pugiconfig.hpp>
#define PUGIXML_HEADER_ONLY
#include <pugixml.hpp>
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
"<nodeA xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">"
"<nodeA.1>This is the Adresse</nodeA.1>"
"<nodeA.2>"
"<node1>"
"<node1.1>"
"<node1.1.1>Female</node1.1.1>"
"<node1.1.2>23</node1.1.2>"
"<node1.1.3>Engineer</node1.1.3>"
"</node1.1>"
"<node1.2>"
"<node1.2.1>Female</node1.2.1>"
"<node1.2.2>35</node1.2.2>"
"<node1.2.3>Doctors</node1.2.3>"
"</node1.2>"
"</node1>"
"</nodeA.2>"
"<nodeA.3>Car 1</nodeA.3>"
"</nodeA>");
pugi::xml_document doc;
doc.load(iss);
for (auto& node : doc.select_nodes("*/descendant::*[count(*)=3]/*[count(*)=0]/.."))
{
auto values = node.node().select_nodes("*/text()");
std::cout << "Gender " << values[0].node().value() << "\n";
std::cout << "Age " << values[1].node().value() << "\n";
std::cout << "Job Title " << values[2].node().value() << "\n";
}
//doc.save(std::cout);
}

PugiXML C++ getting content of an element (or a tag)

Well I'm using PugiXML in C++ using Visual Studio 2010 to get the content of an element, but the thing is that it stops to getting the value when it sees a "<" so it doesn't get the value, it just gets the content till it reaches a "<" character even if the "<" is not closing its element. I want it to get till it reaches its closing tag even if it ignores the tags, but only the text inside of the inner tags, at least.
And I also would like to know how to get the Outer XML for example if I fetch the element
pugi::xpath_node_set tools = doc.select_nodes("/mesh/bounds/b");
what do I do to get the whole content which would be " Link Till here"
this content is the same given down here:
#include "pugixml.hpp"
#include <iostream>
#include <conio.h>
#include <stdio.h>
using namespace std;
int main//21
() {
string source = "<mesh name='sphere'><bounds><b id='hey'> <a DeriveCaptionFrom='lastparam' name='testx' href='http://www.google.com'>Link Till here<b>it will stop here and ignore the rest</b> text</a></b> 0 1 1</bounds></mesh>";
int from_string;
from_string = 1;
pugi::xml_document doc;
pugi::xml_parse_result result;
string filename = "xgconsole.xml";
result = doc.load_buffer(source.c_str(), source.size());
/* result = doc.load_file(filename.c_str());
if(!result){
cout << "File " << filename.c_str() << " couldn't be found" << endl;
_getch();
return 0;
} */
pugi::xpath_node_set tools = doc.select_nodes("/mesh/bounds/b/a[#href='http://www.google.com' and #DeriveCaptionFrom='lastparam']");
for (pugi::xpath_node_set::const_iterator it = tools.begin(); it != tools.end(); ++it) {
pugi::xpath_node node = *it;
std::cout << "Attribute Href: " << node.node().attribute("href").value() << endl;
std::cout << "Value: " << node.node().child_value() << endl;
std::cout << "Name: " << node.node().name() << endl;
}
_getch();
return 0;
}
here is the output:
Attribute Href: http://www.google.com
Value: Link Till here
Name: a
I hope I was clear enough,
Thanks in advance

My psychic powers tell me you want to know how to get the concatenated text of all children of the node (aka inner text).
The easiest way to do that is to use XPath like that:
pugi::xml_node node = doc.child("mesh").child("bounds").child("b");
string text = pugi::xpath_query(".").evaluate_string();
Obviously you can write your own recursive function that concatenates the PCDATA/CDATA values from the subtree; using a built-in recursive traversing facility, such as find_node, would also work (using C++11 lambda syntax):
string text;
text.find_node([&](pugi::xml_node n) -> bool { if (n.type() == pugi::node_pcdata) result += n.value(); return false; });
Now, if you want to get the entire contents of the tag (aka outer xml), you can output a node to string stream, i.e.:
ostringstream oss;
node.print(oss);
string xml = oss.str();
Getting inner xml will require iterating through node's children and appending their outer xml to the result, i.e.
ostringstream oss;
for (pugi::xml_node_iterator it = node.begin(); it != node.end(); ++it)
it->print(oss);
string xml = oss.str();

That's how XML works. You can't embed < or > right in your values. Escape them (e.g. using HTML entities like < and >) or define a CDATA section.

I've struggled a lot with the issue of parsing subtree including all elements and sub-nodes - the easiest way is almost what shown here:
You should use this code:
ostringstream oss;
oNode.print(oss, "", format_raw);
sResponse = oss.str();
Instead of oNode use the node that you want, if needed use pugi:: before every function.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Map node names using pugixml for different inputs - c++

Related

Why can I access C++ map elements with a for loop but not individually?

How to iterate through a specific key in a map containing vector as value?

Combinations of N Boost interval_set

boost recognize a child

PugiXML C++ getting content of an element (or a tag)

Categories

Resources