Using pugixml to read an entire xml file

Using pugixml to read an entire xml file - c++

I know there's already a way to loop through a file with pugi::xml_node::traverse, but I'm very interested in how things work, so I want to reimplement it using a recursive function.
Currently, I can only parse the first depth of the function because I don't know how to detect whether the current item has children (next_siblings returns an invalid value).
// TODO: use std::ostringstream instead of std::string
void MyClass::recursive(const pugi::xml_node& start, std::string& output)
{
// Check for invalid node
if (!start.first_child() || (!start.next_sibling() && start.parent() != start.parent())) {
return;
}
// Process the current node
for (auto node : start.children()) {
output += node.name();
output += "\n";
for (auto attribute : node.attributes()) {
output += "Attribute Name : ";
output += attribute.name();
output += ", Attribute Value = ";
output += attribute.value();
output += " ";
}
output += "\n";
const char* PCDATA = node.child_value();
output += PCDATA == "" ? "[no pcdata]"
: PCDATA;
if (node.first_child()) {
recursive(node, output);
}
else {
recursive(node.next_sibling(), output);
}
}
}
Sample XML file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<child1>
<sub name="attr1">value</sub>
<sub name="attr2">value</sub>
<sub name="attr3">value</sub>
</child1>
<child2>
<sub name="attr1">value</sub>
<sub name="attr2">value</sub>
<sub name="attr3">value</sub>
</child2>
<child3>
<sub name="attr1">value</sub>
<sub_with_children>
<child1 name="[]">value</sub>
<child2 name="[]">value</sub>
<child3 name="[]">value</sub>
</sub_with_children>
</child3>
<child4>
<sub name="attr1">value</sub>
<sub name="attr2">value</sub>
</child4>
</root>
Edit: the code above is now working

Related

Issue in updating XML attribute value for every node with the same name using boost library

I am trying to update the value of totalresult attribute in every test_list node found. The issue is, it will only update the first test_list node found.
The testListCount will increment every time a test_list node is added. Once done adding test_list node, each totalresult value will then be updated in every test_list node.
Here is my code:
BOOST_FOREACH(ptree::value_type const & subTree, mainTree.get_child("my_report"))
{
auto &nodeTestList = mainTree.get_child("my_report.test_list");
BOOST_FOREACH(ptree::value_type const & subval, nodeTestList)
{
ptree subvalTree = subval.second;
BOOST_FOREACH(ptree::value_type const & paramNode, subvalTree)
{
std::string name = paramNode.first;
if (name == TestListAttrib[TestListParam::TOTALRESULT])
{
wxMessageBox("firing!");
nodeTestList.put("<xmlattr>." + name, testListCount);
}
}
}
}
Below is the actual result:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="report.xsl"?>
<my_report>
<test_list overall_status="FAILED" result="1" totalresult="3">
<test_list overall_status="FAILED" result="2" totalresult=""/>
<test_list overall_status="FAILED" result="3" totalresult=""/>
</my_report>
Below is the expected result:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="report.xsl"?>
<my_report>
<test_list overall_status="FAILED" result="1" totalresult="3">
<test_list overall_status="FAILED" result="2" totalresult="3"/>
<test_list overall_status="FAILED" result="3" totalresult="3"/>
</my_report>

Like others said, Property Tree is not an XML library (see What XML parser should I use in C++?).
That said, it looks like your error is here:
for (auto const &subTree : mainTree.get_child("my_report")) {
auto &nodeTestList = mainTree.get_child("my_report.test_list");
The second line doesn't use subTree at all, instead it just matches the first "my_report.test_list" node from mainTree.
Use Modern C++ And Compiler Warnings
I made the code self-contained c++11:
#include <boost/property_tree/xml_parser.hpp>
using boost::property_tree::ptree;
enum TestListParam { OVERALLSTATUS, TOTALRESULT };
std::array<std::string, 2> TestListAttrib{ "overall_status", "totalresult" };
int main() {
ptree mainTree;
{
std::ifstream ifs("input.xml");
read_xml(ifs, mainTree);
}
auto const testListCount = 3;
for (auto const& subTree : mainTree.get_child("my_report")) {
auto& nodeTestList = mainTree.get_child("my_report.test_list");
for (auto& subval : nodeTestList) {
ptree subvalTree = subval.second;
for (auto& paramNode : subvalTree) {
std::string name = paramNode.first;
if (name == TestListAttrib[TestListParam::TOTALRESULT]) {
nodeTestList.put("<xmlattr>." + name, testListCount);
}
}
}
}
}
If you enable compiler warnings, you will see your error:
Live On Wandbox
prog.cc:16:22: warning: unused variable 'subTree' [-Wunused-variable]
for (auto const& subTree : mainTree.get_child("my_report")) {
^
1 warning generated.
More Modern C++
Using the niceties of C++17 things become cleaner and easier fixed. Here's a first shot, also adding output printing:
Live On Wandbox
#include <boost/property_tree/xml_parser.hpp>
#include <iostream>
using boost::property_tree::ptree;
auto const pretty = boost::property_tree::xml_writer_make_settings<std::string>(' ', 4);
enum TestListParam { OVERALLSTATUS, TOTALRESULT };
std::array<std::string, 2> TestListAttrib{ "overall_status", "totalresult" };
int main() {
ptree mainTree;
{
std::ifstream ifs("input.xml");
read_xml(ifs, mainTree);
}
auto const testListCount = 3;
for (auto& [key, subTree] : mainTree.get_child("my_report"))
for (auto& [name, node] : subTree.get_child("<xmlattr>")) {
if (name == TestListAttrib[TestListParam::TOTALRESULT]) {
node.put_value(testListCount);
}
}
write_xml(std::cout, mainTree, pretty);
}
Prints: (whitespace reduced)
<?xml version="1.0" encoding="utf-8"?>
<my_report>
<test_list overall_status="FAILED" result="1" totalresult="3"/>
<test_list overall_status="FAILED" result="2" totalresult="3"/>
<test_list overall_status="FAILED" result="3" totalresult="3"/>
</my_report>
Caveats
Note how because of the way we write the loops the code
will fail if <xmlattr> or my_report are not found
Conversely, it will erroneously descend all child nodes of my_report even if they have different names than test_list
the XSL processing instruction is lost. Once again, this is inherent because Boost Property Tree doesn't know about XML. It uses a subset of XML to implement serialization for property trees.
To fix the first two bullets, I'd suggest making a helper to query nodes from your XML (from Iterating on xml file with boost):
enumerate_nodes(mainTree,
"my_report.test_list.<xmlattr>.totalresult",
back_inserter(nodes));
This doesn't suffer from any of the problems mentioned, and you can elegantly assing all matching nodes:
for (ptree& node : nodes)
node.put_value(3);
If you really didn't /want/ to require the test_list node name, use a wildcard:
enumerate_nodes(mainTree,
"my_report.*.<xmlattr>.totalresult",
back_inserter(nodes));
Live Demo
Live On Wandbox
#include <boost/property_tree/xml_parser.hpp>
#include <iostream>
using boost::property_tree::ptree;
auto const pretty = boost::property_tree::xml_writer_make_settings<std::string>(' ', 4);
enum TestListParam { OVERALLSTATUS, TOTALRESULT };
std::array<std::string, 2> TestListAttrib{ "overall_status", "totalresult" };
template <typename Ptree, typename Out>
Out enumerate_nodes(Ptree& pt, ptree::path_type path, Out out) {
if (path.empty())
return out;
if (path.single()) {
auto name = path.reduce();
for (auto& child : pt) {
if (child.first == name)
*out++ = child.second;
}
} else {
auto head = path.reduce();
for (auto& child : pt) {
if (head == "*" || child.first == head) {
out = enumerate_nodes(child.second, path, out);
}
}
}
return out;
}
int main() {
ptree mainTree;
{
std::ifstream ifs("input.xml");
read_xml(ifs, mainTree);
}
std::vector<std::reference_wrapper<ptree> > nodes;
enumerate_nodes(mainTree,
"my_report.test_list.<xmlattr>.totalresult",
back_inserter(nodes));
for (ptree& node : nodes)
node.put_value(3);
write_xml(std::cout, mainTree, pretty);
}
Prints
<?xml version="1.0" encoding="utf-8"?>
<my_report>
<test_list overall_status="FAILED" result="1" totalresult="3"/>
<test_list overall_status="FAILED" result="2" totalresult="3"/>
<test_list overall_status="FAILED" result="3" totalresult="3"/>
</my_report>

Lastchild in xml file using QDomDocument Class

i have this xml:
<VCAAnalysis>
<VCAStream>
<VCAFrame width="768" height="432" rtptime="" utctime="102157000" utctimeHigh="0" configID="0" />
<VCAFrame width="768" height="432" rtptime="" utctime="102157160" utctimeHigh="0" configID="0">
<Object objectID="138.96.200.59_20160126_102157160_1" minX="276" minY="0" maxX="320" maxY="123" width="44" height="123" ObjPropTag="PERSON">
</Object>
</VCAFrame>
<VCAFrame width="768" height="432" rtptime="" utctime="102157320" utctimeHigh="0" configID="0" />
<VCAFrame width="768" height="432" rtptime="" utctime="102157480" utctimeHigh="0" configID="0">
<Object objectID="138.96.200.59_20160126_102157480_2" minX="224" minY="264" maxX="287" maxY="343" width="63" height="79" ObjPropTag="PERSON">
</Object>
</VCAFrame>
<VCAFrame width="768" height="432" rtptime="" utctime="102157640" utctimeHigh="0" configID="0">
<Object objectID="138.96.200.59_20160126_102157480_3" minX="204" minY="266" maxX="331" maxY="400" width="127" height="134" ObjPropTag="PERSON">
</Object>
</VCAFrame>
<VCAFrame width="768" height="432" rtptime="" utctime="102157000" utctimeHigh="0" configID="0" />
</VCAStream>
</VCAAnalysis>
I want to get the last objectID(138.96.200.59_20160126_102157480_3) in the last VCAFrame which have an object.
i tried this code but it doesn't work.
QDomNodeList a = VCAStream.elementsByTagName("VCAFrame");
if(a.size()!=0) {
QDomElement lastobj = VCAStream.lastChild().toElement();
QDomElement last = lastobj.firstChild().toElement();
QString lastid = last.attribute("objectID");
cout << qPrintable("laaaaaaaast "+lastid) << endl;
}

This worked for me:
QDomNodeList vcaStreams = VCAStream.elementsByTagName("VCAStream");
QDomNodeList vcaFrames = vcaStreams.at(0).childNodes(); //Gives 6 VCAFrame tags
QDomNodeList vcaObjects = vcaFrames.at(4).childNodes(); //Gives 1 Object tag
qDebug() << vcaObjects.at(0).toElement().attribute("objectID");
lastobj in your code refers to the last VCAFrame, which does not have an objectID.
EDIT: If you need to iterate over an entire xml file. I'm assuming that you want the last vcaFrame that has an objectID in each VCAStream.
QDomNodeList vcaStreams = VCAStream.elementsByTagName("VCAStream");
for (int i = 0; i < vcaStreams.count(); ++i) {
QDomNodeList vcaFrames = vcaStreams.at(i).childNodes(); //Gives us all VCAFrameTags
//Find last tag with objectID
QDomElement last;
for (int j = vcaFrames.count() - 1; j >= 0; --j) {
//Assumes there is at most one <object> tag in each VCAFrame
if (vcaFrames.at(j).hasChildNodes()) {
QDomElement tmp = vcaFrames.at(j).firstChild().toElement();
if (tmp.hasAttribute("objectID")) {
last = tmp;
break;
}
}
}
//last now holds the last VCAFrame with an object tag or is Null
if (last.isNull())
qDebug() << "No objectID found";
else
qDebug() << last.attribute("objectID");
}
I tested this on your XML file and it gave me the correct result, but I did not try adding more than one VCAStream tag.

Parse XML with QXmlStreamReader

I created this xml file with QXmlStreamWriter:
<?xml version="1.0" encoding="UTF-8"?>
<Draw>
<Input>
<Column title="A"/>
<Column title="B"/>
<Column title="C"/>
<Column title="D">
<item id="0">Bayer Leverkusen</item>
<item id="1">Benfica</item>
<item id="2">Villareal</item>
<item id="3">Montpellier</item>
</Column>
</Input>
</Draw>
I would like to create a Vector of String containing all the items inside the tag Column title="D": Now, I know how to create a QVector and how they fit elements on the inside, I just have to figure out how I can do this by extrapolating information from an xml file.
Can you help me?

You can use the QXmlStreamReader to iterate through the XML elements and find the <Column title="D"> element. Once you found it, the readNextStartElement() in combination of skipCurrentElement() can be used to read its all child elements.
Let's assume that the XML document you shown in your examle can be read from the xmlDocument object. To extract all <item> elements from <Column title="D"> element with appropriate error checking, you can do the following:
QXmlStreamReader xmlIterator(xmlDocument);
QVector<QString> output;
for(; !xmlIterator.atEnd(); xmlIterator.readNext()) {
if(isStartElementOfColumnD(xmlIterator)) {
while(xmlIterator.readNextStartElement()) {
if(isItemElement(xmlIterator))
output.append(xmlIterator.readElementText());
else
xmlIterator.skipCurrentElement();
}
}
}
if(xmlIterator.hasError())
qCritical() << "Error has occurred:" << xmlIterator.errorString();
else
qDebug() << output;
In the example above I used two predicates to hide the long and hardly readable validation of xmlIterator. These are the following:
inline bool isStartElementOfColumnD(const QXmlStreamReader& xmlIterator) {
return xmlIterator.isStartElement() && xmlIterator.name() == "Column" &&
xmlIterator.attributes().value("title") == "D";
}
inline bool isItemElement(const QXmlStreamReader& xmlIterator) {
return xmlIterator.name() == "item" &&
xmlIterator.attributes().hasAttribute("id");
}
Sample result:
QVector("Bayer Leverkusen", "Benfica", "Villareal", "Montpellier")

I would write it in the following way:
QVector<QString> store;
[..]
if (reader.readNextStartElement() && reader.name() == "Draw") {
while (reader.readNextStartElement() && reader.name() == "Input") {
while (reader.readNextStartElement()) {
QXmlStreamAttributes attr = reader.attributes();
if (reader.name() == "Column" && attr.value("title").toString() == "D") {
while(!(reader.isEndElement() && reader.name() == "Column")) {
if (reader.isStartElement() && reader.name() == "item") {
QString text = reader.readElementText();
store.append(text);
}
reader.readNext();
if (reader.hasError()) {
// Handle error.
QString msg = reader.errorString();
break;
}
}
} else {
reader.readNext();
}
}
}
} else {
reader.raiseError("Expected <Draw> element");
}
[..]

TinyXML getting Value

given XML like:
<a>
<result>0</result>
<data>I9C3J9N3cCTZdKGK+itJW1Q==</data>
</a>
I need to get the fact that <result> is 0 and act upon it.
I am doing:
TiXmlDocument doc;
bool bOK = doc.Parse((const char*)chunk.memory, 0, TIXML_ENCODING_UTF8);
if (bOK)
{
TiXmlHandle hDoc(&doc);
TiXmlElement *pRoot, *pParm, *pParm2;
pRoot = doc.FirstChildElement("a");
if(pRoot)
{
pParm = pRoot->FirstChildElement("result");
if (pParm)
{
if (pParm->GetText()=="0")
{
pParm2 = pRoot->NextSiblingElement("data");
if (pParm2)
{
sValue = pParm2->GetText();
std::cout << "sValue: " << sValue << std::endl;
}
}
}
}
}
I thought that GetText() was the right answer, but I am doing something wrong because I never get inside the if to check the <data> element.
Can anyone shed some light for me?

Because in your case, <data> isn't Sibling of <a>.
You're checking pRoot->NextSiblingElement("data") while you should check for pParm->NextSiblingElement("data");
You could also change it to
pParm2 = pRoot->FirstChildElement("data");
Edit:
Sorry, i thought you were referring to this if:
if (pParm2)
So, the solution could be this:
if (std::string(pParm->GetText())=="0")
or
if (strcmp(pParm->GetText(), "0"))
You choose. I prefer the first one.
Edit 2:
I'm really sorry, I was forgetting that strcmp return the first position of where the 2 strings are the same, so, in your case it should be:
if (strcmp(pParm->GetText(), "0") == 0)
You need to include <string.h> too.

boost::property_:tree - parsing and processing data

I have just discovered boost::property_tree, which seems the perfect answer to my problem. I wrote a small test program to extract specific data from an xml file. I have used the example provided in the documentation as a guide.
The xml file: test.xml:
<section>
<GROUP>
<g_name>ABC</g_name>
<fields>
<row>
<name>A</name>
<datatype>string</datatype>
<field_size>6</field_size>
<value>ABC</value>
</row>
<row>
<name>B</name>
<datatype>integer</datatype>
<field_size>5</field_size>
<value>00107</value>
</row>
<row>
<name>C</name>
<datatype>string</datatype>
<field_size>20</field_size>
<value>LOTS OF LETTERS </value>
</row>
</fields>
</GROUP>
<GROUP>
<g_name>CDE</g_name>
<fields>
<row>
<name>A</name>
<datatype>string</datatype>
<field_size>6</field_size>
<value>CDE</value>
</row>
<row>
<name>B</name>
<datatype>integer</datatype>
<field_size>5</field_size>
<value>00100</value>
</row>
<row>
<name>F</name>
<datatype>integer</datatype>
<field_size>4</field_size>
<value>1970</value>
</row>
</fields>
</GROUP>
</section>
The code:
using boost::property_tree::ptree;
struct t_collection
{
ptree pt;
void load(const std::string &filename);
void print();
};
void t_collection::load(const std::string &filename)
{
read_xml(filename, pt);
}
void t_collection::print()
{
BOOST_FOREACH(ptree::value_type &v, pt.get_child("section.GROUP"))
BOOST_FOREACH(ptree::value_type &v, pt.get_child("section.GROUP"))
{
printf("X: %s->", v.second.data().c_str());
//prints X: ABC ->
BOOST_FOREACH(ptree::value_type &w, pt.get_child("section.GROUP.fields.row"))
printf("%s\n", w.second.data().c_str());
//prints A, string, 6, ABC - that is good for first iteration but there should be 3 iterations here
}
//then prints X: and just "" and repeats the set from the first one
}
int main()
{
try
{
t_collection t1;
t1.load("test.xml");
t1.print();
}
catch (std::exception &e)
{
std::cout << "Error: " << e.what() << "\n";
}
return 0;
}
Note: I am trying to extract the values (ABC and the inner values, like A - string - 6 - ABC, for each GROUP - and each set of "row", which I will process and then output in a different format). Please see comment in code for something I tried.
So far the best result was with: (contents inside print():
BOOST_FOREACH(ptree::value_type &z, pt.get_child("section"))
//BOOST_FOREACH(ptree::value_type &v, pt.get_child("section.GROUP"))
{
printf("X: %s->", pt.get<std::string>("section.GROUP.g_mame", "default").c_str());
//prints X: ABC ->
BOOST_FOREACH(ptree::value_type &w, pt.get_child("section.GROUP.fields.row"))
{
printf("%s\n", pt.get<std::string>("section.GROUP.fields.row.name", "name").c_str());
printf("%s\n", pt.get<std::string>("section.GROUP.fields.row.datatype", "type").c_str());
printf("%s\n", pt.get<std::string>("section.GROUP.fields.row.field_size", "size").c_str());
printf("%s\n", pt.get<std::string>("section.GROUP.fields.row.value", "value").c_str());
}
}
//prints x: default->A, string, 6, ABC (3 times) then repeat identically
I can't get the data from more than one record ! Please help, give me a suggestion - what am I doing wrong ?
Thank you.

You are missing a level in your iteration. You need to iterate over the elements that have multiple children with the same name.
std::pair<ptree::const_assoc_iterator, ptree::const_assoc_iterator>
r(pt.get_child("section").equal_range("GROUP"));
for (ptree::const_assoc_iterator i(r.first); i != r.second; ++i) {
// Do something with each group.
}
Repeat as appropriate as you descend the tree.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using pugixml to read an entire xml file - c++

Related

Issue in updating XML attribute value for every node with the same name using boost library

Lastchild in xml file using QDomDocument Class

Parse XML with QXmlStreamReader

TinyXML getting Value

boost::property_:tree - parsing and processing data

Categories

Resources