Parsing XML file with RapidXML - only parsing first line of files - c++

I am having trouble with RapidXML only parsing the first line of my file (or so it appears). When I feed in a sample file, it merely gets the first node (“map”) and nothing else. I set a breakpoint in Xcode after the parsing to inspect the result and there seems to be NULL values for majority of the attributes. Does anyone have any recommendations on how to fix this? It is my understanding that the parser is suppose to produce some form of tree like structure. Perhaps I have a misunderstanding of the resulting data structure?
Here is my usage:
#include <iostream>
#include "rapidxml_utils.hpp"
using namespace std;
int main(){
rapidxml::file<> xmlFile("sample.txt.xml");
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
cout << "Name of my first node is: " << doc.first_node()->name() << "\n";
rapidxml::xml_node<> *node = doc.first_node("map");
cout << "Node map has value " << node->value() << "\n";
for (rapidxml::xml_attribute<> *attr = node->first_attribute();
attr; attr = attr->next_attribute())
{
cout << "Node foobar has attribute " << attr->name() << " ";
cout << "with value " << attr->value() << "\n";
}
}
Here is an example of the file I am trying to parse:
<?xml version="1.0" encoding="utf-8"?>
<map>
<room>
<name>Entrance</name>
<description>You find yourself at the mouth of a cave</description>
<item>torch</item>
<trigger>
<type>permanent</type>
<command>n</command>
<condition>
<has>no</has>
<object>torch</object>
<owner>inventory</owner>
</condition>
<print>*stumble* need some light...</print>
</trigger>
<border>
<direction>north</direction>
<name>MainCavern</name>
</border>
</room>
</map>

You're confusing XML attributes and elements.
Attributes would look like this: <map name="Zork" author="Infocom">
If you want to iterate over all elements in the 'tree', you really need a recursive algorithm that uses the rapidxml first_node() and next_sibling()methods.

Related

Map node names using pugixml for different inputs

Problem
My program spits out XML nodes from a file using pugixml. This is the bit of the code which does this:
for (auto& ea: mapa) {
std::cout << "Removed:" << std::endl;
ea.second.print(std::cout);
}
for (auto& eb: mapb) {
std::cout << "Added:" << std::endl;
eb.second.print(std::cout);
}
All nodes spat out should have this format (for example filea.xml):
<entry>
<id><![CDATA[9]]></id>
<description><![CDATA[Dolce 27 Speed]]></description>
</entry>
However what is spat out depends on how the input data is formatted. Sometimes the tags are called different things and I could end up with this (for example fileb.xml):
<entry>
<id><![CDATA[9]]></id>
<mycontent><![CDATA[Dolce 27 Speed]]></mycontent>
</entry>
Possible solution
Is it possible to define non standard mappings (names of nodes) so that, no matter what the names of the nodes are on the input file, I always std:cout it in the same format (id and description)
It seems like the answer is based on this code:
description = mycontent; // Define any non-standard maps
std::cout << node.set_name("notnode");
std::cout << ", new node name: " << node.name() << std::endl;
I'm new to C++ so any suggestions on how to implement this would be appreciated. I have to run this on tens of thousands of fields so performance is key.
Reference
https://pugixml.googlecode.com/svn/tags/latest/docs/manual/modify.html
https://pugixml.googlecode.com/svn/tags/latest/docs/samples/modify_base.cpp
Maybe something like this will is what you're looking for?
#include <map>
#include <string>
#include <iostream>
#include "pugixml.hpp"
using namespace pugi;
int main()
{
// tag mappings
const std::map<std::string, std::string> tagmaps
{
{"odd-id-tag1", "id"}
, {"odd-id-tag2", "id"}
, {"odd-desc-tag1", "description"}
, {"odd-desc-tag2", "description"}
};
// working registers
std::map<std::string, std::string>::const_iterator found;
// loop through the nodes n here
for(auto&& n: nodes)
{
// change node name if mapping found
if((found = tagmaps.find(n.name())) != tagmaps.end())
n.set_name(found->second.c_str());
}
}

Pugixml C++ parsing XML

I am a newbie in pugixml. Consider I have XML given here. I want to get value of Name and Roll of Every Student. The code below only find the tag but not the value.
#include <iostream>
#include "pugixml.hpp"
int main()
{
std::string xml_mesg = "<data> \
<student>\
<Name>student 1</Name>\
<Roll>111</Roll>\
</student>\
<student>\
<Name>student 2</Name>\
<Roll>222</Roll>\
</student>\
<student>\
<Name>student 3</Name>\
<Roll>333</Roll>\
</student>\
</data>";
pugi::xml_document doc;
doc.load_string(xml_mesg.c_str());
pugi::xml_node data = doc.child("data");
for(pugi::xml_node_iterator it=data.begin(); it!=data.end(); ++it)
{
for(pugi::xml_node_iterator itt=it->begin(); itt!=it->end(); ++itt)
std::cout << itt->name() << " " << std::endl;
}
return 0;
}
I want the output of Name and Roll for each student. How can I modify above code? Also, if one can refer here(press Test), I can directly write xpath which is supported by pugixml. If so, how can I get the values I seek using Xpath in Pugixml.
Here's how you can do it with just Xpath:
pugi::xpath_query student_query("/data/student");
pugi::xpath_query name_query("Name/text()");
pugi::xpath_query roll_query("Roll/text()");
pugi::xpath_node_set xpath_students = doc.select_nodes(student_query);
for (pugi::xpath_node xpath_student : xpath_students)
{
// Since Xpath results can be nodes or attributes, you must explicitly get
// the node out with .node()
pugi::xml_node student = xpath_student.node();
pugi::xml_node name = student.select_node(name_query).node();
pugi::xml_node roll = student.select_node(roll_query).node();
std::cout << "Student name: " << name.value() << std::endl;
std::cout << " roll: " << roll.value() << std::endl;
}
I think that the reason why you are getting the "tags/nodes" instead of their values is because you are using the name() function instead of value(). Try replacing your itt->name() with itt->value() instead.
I found some good documentation about accessing document data here
Thanks #Cornstalks for the insight of using xpath in pugixml. I used child_value given here. The code of mine was thus:
for(pugi::xml_node_iterator it=data.begin(); it!=data.end(); ++it)
{
for(pugi::xml_node_iterator itt=it->begin(); itt!=it->end(); ++itt)
std::cout << itt->name() << " " << itt->child_value() << " " << std::endl;
}
I could also use xpath as #Cornstalks suggested thus making my code as:
pugi::xml_document doc;
doc.load_string(xml_mesg.c_str());
pugi::xpath_query student_query("/data/student");
pugi::xpath_query name_query("Name/text()");
pugi::xpath_query roll_query("Roll/text()");
pugi::xpath_node_set xpath_students = doc.select_nodes(student_query);
for (pugi::xpath_node xpath_student : xpath_students)
{
// Since Xpath results can be nodes or attributes, you must explicitly get
// the node out with .node()
pugi::xml_node student = xpath_student.node();
pugi::xml_node name = student.select_node(name_query).node();
pugi::xml_node roll = student.select_node(roll_query).node();
std::cout << "Student name: " << name.value() << std::endl;
std::cout << " roll: " << roll.value() << std::endl;
}
In your inner loop change the following line to get the values like :
student1 and 111 and so on...
std::cout << itt.text().get() << " " << std::endl;

boost recognize a child

My question is related to : boost
Some of the boost code is working correctly to find that a node has child, but if one node have two other nodes it didn't recognize the children.
It's recursive call to be able to read all the tree nodes and then apply the copy of the value to the google protocol buffer
void ReadXML(iptree& tree, string doc)
{
const GPF* gpf= pMessage->GetGPF();
for(int i = 0 ; i < gpf->field_count(); ++i)
{
string fieldName = GetName(i);
boost::optional< iptree & > chl = pt.get_child_optional(fieldName);
if(chl) {
for( auto a : *chl ){
boost::property_tree::iptree subtree = (boost::property_tree::iptree) a.second ;
assignDoc(doc);
ReadXML(subtree, doc);
}
}
}
}
the XML file
<?xml version="1.0" encoding="utf-8"?>
<nodeA xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<nodeA.1>This is the Adresse</nodeA.1>
<nodeA.2>
<node1>
<node1.1>
<node1.1.1>Female</node1.1.1>
<node1.1.2>23</node1.1.2>
<node1.1.3>Engineer</node1.1.3>
</node1.1>
<node1.1>
<node1.2.1>Female</node1.2.1>
<node1.2.2>35</node1.2.2>
<node1.2.3>Doctors</node1.2.3>
</node1.1>
</node1>
</nodeA.2>
<nodeA.3>Car 1</nodeA.3>
</nodeA>
My problem is that node1 is not recognised as having child. I don't know if it's because there are two children nodes with the same name.
Note that the XML files may change from one client to another. I may have different nodes.
Do I have to use a.second or a.first?
Here
boost::optional< iptree & > chl = pt.get_child_optional(fieldName);
you explicitly search for a child with a given name. This name never seems the change during recursion. On every level you look for children with the same name it seems.
I think you could/should be looking at this problem from a higher level.
Boost Property Tree uses RapidXML under the hood. PugiXML is a similar, but more modern library that can also be used in header-only mode. With PugiXML you could write:
pugi::xml_document doc;
doc.load(iss);
for (auto& node : doc.select_nodes("*/descendant::*[count(*)=3]/*[count(*)=0]/.."))
{
auto values = node.node().select_nodes("*/text()");
std::cout << "Gender " << values[0].node().value() << "\n";
std::cout << "Age " << values[1].node().value() << "\n";
std::cout << "Job Title " << values[2].node().value() << "\n";
}
It selects all descendants of the root node (nodeA) that three leaf child nodes, and interprets them as Gender, Age and Job Title. It prints:
Gender Female
Age 23
Job Title Engineer
Gender Female
Age 35
Job Title Doctors
I hope you will find this constructive.
Full Demo
On my system to build, simply:
sudo apt-get install libpugixml-dev
g++ -std=c++11 demo.cpp -lpugixml -o demo
./demo
demo.cpp:
#include <pugiconfig.hpp>
#define PUGIXML_HEADER_ONLY
#include <pugixml.hpp>
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
"<nodeA xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">"
"<nodeA.1>This is the Adresse</nodeA.1>"
"<nodeA.2>"
"<node1>"
"<node1.1>"
"<node1.1.1>Female</node1.1.1>"
"<node1.1.2>23</node1.1.2>"
"<node1.1.3>Engineer</node1.1.3>"
"</node1.1>"
"<node1.2>"
"<node1.2.1>Female</node1.2.1>"
"<node1.2.2>35</node1.2.2>"
"<node1.2.3>Doctors</node1.2.3>"
"</node1.2>"
"</node1>"
"</nodeA.2>"
"<nodeA.3>Car 1</nodeA.3>"
"</nodeA>");
pugi::xml_document doc;
doc.load(iss);
for (auto& node : doc.select_nodes("*/descendant::*[count(*)=3]/*[count(*)=0]/.."))
{
auto values = node.node().select_nodes("*/text()");
std::cout << "Gender " << values[0].node().value() << "\n";
std::cout << "Age " << values[1].node().value() << "\n";
std::cout << "Job Title " << values[2].node().value() << "\n";
}
//doc.save(std::cout);
}

Build an xml tree from scratch - pugixml C++

Firstly I would like to say that I have been using an XML parser written by Frank Vanden Berghen and recently trying to migrate to Pugixml. I am finding the transition bit difficult. Hoping to get some help here.
Question: How can I build a tree from scratch for the small xml below using pugixml APIs? I tried looking into the examples on the pugixml home page, but most of them are hard coded with root node values. what I mean is
if (!doc.load("<node id='123'>text</node><!-- comment -->", pugi::parse_default | pugi::parse_comments)) return -1;
is hard-coded. Also I tried reading about xml_document and xml_node documentation but could not figure out how to start with if I have to build a tree from scratch.
#include "pugixml.hpp"
#include <string.h>
#include <iostream>
int main()
{
pugi::xml_document doc;
if (!doc.load("<node id='123'>text</node><!-- comment -->", pugi::parse_default | pugi::parse_comments)) return -1;
//[code_modify_base_node
pugi::xml_node node = doc.child("node");
// change node name
std::cout << node.set_name("notnode");
std::cout << ", new node name: " << node.name() << std::endl;
// change comment text
std::cout << doc.last_child().set_value("useless comment");
std::cout << ", new comment text: " << doc.last_child().value() << std::endl;
// we can't change value of the element or name of the comment
std::cout << node.set_value("1") << ", " << doc.last_child().set_name("2") << std::endl;
//]
//[code_modify_base_attr
pugi::xml_attribute attr = node.attribute("id");
// change attribute name/value
std::cout << attr.set_name("key") << ", " << attr.set_value("345");
std::cout << ", new attribute: " << attr.name() << "=" << attr.value() << std::endl;
// we can use numbers or booleans
attr.set_value(1.234);
std::cout << "new attribute value: " << attr.value() << std::endl;
// we can also use assignment operators for more concise code
attr = true;
std::cout << "final attribute value: " << attr.value() << std::endl;
//]
}
// vim:et
XML:
<?xml version="1.0" encoding="UTF-8"?>
<d:testrequest xmlns:d="DAV:" xmlns:o="urn:example.com:testdrive">
<d:basicsearch>
<d:select>
<d:prop>
<o:versionnumber/>
<d:creationdate />
</d:prop>
</d:select>
<d:from>
<d:scope>
<d:href>/</d:href>
<d:depth>infinity</d:depth>
</d:scope>
</d:from>
<d:where>
<d:like>
<d:prop>
<o:name />
</d:prop>
<d:literal>%img%</d:literal>
</d:like>
</d:where>
</d:basicsearch>
</d:testrequest>
I could see most of the examples posted on how to read/parse the xml, but I could not find how to create one from the scratch.
Please refer to the following section of the manual https://github.com/zeux/pugixml/blob/master/docs/manual.html#manual.modify.add and to the following sample code https://github.com/zeux/pugixml/blob/master/docs/samples/modify_add.cpp
Home page of pugixml gives sample code for building XML tree from scratch.
Summary: Use default constructor for pugi::xml_document doc, then append_child for the root node. Generally, a node is first inserted. The insertion call's return value then serves as a handle for filling the XML node.
Constructing xml tree

PugiXML C++ getting content of an element (or a tag)

Well I'm using PugiXML in C++ using Visual Studio 2010 to get the content of an element, but the thing is that it stops to getting the value when it sees a "<" so it doesn't get the value, it just gets the content till it reaches a "<" character even if the "<" is not closing its element. I want it to get till it reaches its closing tag even if it ignores the tags, but only the text inside of the inner tags, at least.
And I also would like to know how to get the Outer XML for example if I fetch the element
pugi::xpath_node_set tools = doc.select_nodes("/mesh/bounds/b");
what do I do to get the whole content which would be " Link Till here"
this content is the same given down here:
#include "pugixml.hpp"
#include <iostream>
#include <conio.h>
#include <stdio.h>
using namespace std;
int main//21
() {
string source = "<mesh name='sphere'><bounds><b id='hey'> <a DeriveCaptionFrom='lastparam' name='testx' href='http://www.google.com'>Link Till here<b>it will stop here and ignore the rest</b> text</a></b> 0 1 1</bounds></mesh>";
int from_string;
from_string = 1;
pugi::xml_document doc;
pugi::xml_parse_result result;
string filename = "xgconsole.xml";
result = doc.load_buffer(source.c_str(), source.size());
/* result = doc.load_file(filename.c_str());
if(!result){
cout << "File " << filename.c_str() << " couldn't be found" << endl;
_getch();
return 0;
} */
pugi::xpath_node_set tools = doc.select_nodes("/mesh/bounds/b/a[#href='http://www.google.com' and #DeriveCaptionFrom='lastparam']");
for (pugi::xpath_node_set::const_iterator it = tools.begin(); it != tools.end(); ++it) {
pugi::xpath_node node = *it;
std::cout << "Attribute Href: " << node.node().attribute("href").value() << endl;
std::cout << "Value: " << node.node().child_value() << endl;
std::cout << "Name: " << node.node().name() << endl;
}
_getch();
return 0;
}
here is the output:
Attribute Href: http://www.google.com
Value: Link Till here
Name: a
I hope I was clear enough,
Thanks in advance
My psychic powers tell me you want to know how to get the concatenated text of all children of the node (aka inner text).
The easiest way to do that is to use XPath like that:
pugi::xml_node node = doc.child("mesh").child("bounds").child("b");
string text = pugi::xpath_query(".").evaluate_string();
Obviously you can write your own recursive function that concatenates the PCDATA/CDATA values from the subtree; using a built-in recursive traversing facility, such as find_node, would also work (using C++11 lambda syntax):
string text;
text.find_node([&](pugi::xml_node n) -> bool { if (n.type() == pugi::node_pcdata) result += n.value(); return false; });
Now, if you want to get the entire contents of the tag (aka outer xml), you can output a node to string stream, i.e.:
ostringstream oss;
node.print(oss);
string xml = oss.str();
Getting inner xml will require iterating through node's children and appending their outer xml to the result, i.e.
ostringstream oss;
for (pugi::xml_node_iterator it = node.begin(); it != node.end(); ++it)
it->print(oss);
string xml = oss.str();
That's how XML works. You can't embed < or > right in your values. Escape them (e.g. using HTML entities like < and >) or define a CDATA section.
I've struggled a lot with the issue of parsing subtree including all elements and sub-nodes - the easiest way is almost what shown here:
You should use this code:
ostringstream oss;
oNode.print(oss, "", format_raw);
sResponse = oss.str();
Instead of oNode use the node that you want, if needed use pugi:: before every function.