RapidXML giving empty CDATA nodes - c++

I wrote the code bellow to get CDATA node value too, I got the node's name, but the values are in blank.
I changed the parse Flags to parse_full, but it not worked too.
If I manually remove "<![CDATA[" and "]]>" from the XML, It gives the value as expected, but removing it before parse is not a option.
The code:
#include <iostream>
#include <vector>
#include <sstream>
#include "rapidxml/rapidxml_utils.hpp"
using std::vector;
using std::stringstream;
using std::cout;
using std::endl;
int main(int argc, char* argv[]) {
rapidxml::file<> xmlFile("test.xml");
rapidxml::xml_document<> doc;
doc.parse<rapidxml::parse_full>(xmlFile.data());
rapidxml::xml_node<>* nodeFrame = doc.first_node()->first_node()->first_node();
cout << "BEGIN\n\n";
do {
cout << "name: " << nodeFrame->first_node()->name() << "\n";
cout << "value: " << nodeFrame->first_node()->value() << "\n\n";
} while( nodeFrame = nodeFrame->next_sibling() );
cout << "END\n\n";
return 0;
}
The XML:
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0" xmlns:c="http://base.google.com/cns/1.0">
<itens>
<item>
<title><![CDATA[Title 1]]></title>
<g:id>34022</g:id>
<g:price>2173.00</g:price>
<g:sale_price>1070.00</g:sale_price>
</item>
<item>
<title><![CDATA[Title 2]]></title>
<g:id>34021</g:id>
<g:price>217.00</g:price>
<g:sale_price>1070.00</g:sale_price>
</item>
</itens>
</rss>

When you use CDATA, RapidXML parses that as a separate node 'below' the outer element in the hierarchy.
Your code correctly gets 'title' by using nodeFrame->first_node()->name(), but - because the CDATA text is in a separate element, you'd need to use this to extract the value:
cout << "value: " <<nodeFrame->first_node()->first_node()->value();

Related

Parsing an XML document

I want to parse an XML document in c++ and be able to identify what text exists in a particular tag. I have checked parsers like TiyXML and PugiXML but none of them seem to identify the tags separately. How can I achieved this?
Using RapidXml, you can traverse the nodes and attributes and identify the text of their tag.
#include <iostream>
#include <rapidxml.hpp>
#include <rapidxml_utils.hpp>
#include <rapidxml_iterators.hpp>
int main()
{
using namespace rapidxml;
file<> in ("input.xml"); // Load the file in memory.
xml_document<> doc;
doc.parse<0>(in.data()); // Parse the file.
// Traversing the first-level elements.
for (node_iterator<> first=&doc, last=0; first!=last; ++first)
{
std::cout << first->name() << '\n'; // Write tag.
// Travesing the attributes of the element.
for (attribute_iterator<> attr_first=*first, attr_last=0;
attr_first!=attr_last; ++attr_first)
{
std::cout << attr_first->name() << '\n'; // Write tag.
}
}
}
To get all tag names with pugixml:
void dumpTags(const pugi::xml_node& node) {
if (!node.empty()) {
std::cout << node.name() << std::endl;
for (pugi::xml_node child=node.first_child(); child; child=child.next_sibling())
dumpTags(child);
}
}
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load("<tag1>abc<tag2>def</tag2>pqr</tag1>");
dumpTags(doc.first_child());

Trouble using get_value with Boost's property trees

I have to write an XML parser with Boost. However I have some trouble.
I can access the nodes name without any problem, but for some reason I can't access the attributes inside a tag by using get_value, which should work instantly. Maybe there is a mistake in my code I didn't spot? Take a look:
void ParametersGroup::load(const boost::property_tree::ptree &pt)
{
using boost::property_tree::ptree;
BOOST_FOREACH(const ptree::value_type& v, pt)
{
name = v.second.get_value<std::string>("name");
std::string node_name = v.first;
if (node_name == "<xmlattr>" || node_name == "<xmlcomment>")
continue;
else if (node_name == "ParametersGroup")
sg.load(v.second); // Recursion to go deeper
else if (node_name == "Parameter")
{
// Do stuff
std::cout << "PARAMETER_ELEM" << std::endl;
std::cout << "name: " << name << std::endl;
std::cout << "node_name: " << node_name << std::endl << std::endl;
}
else
{
std::cerr << "FATAL ERROR: XML document contains a non-recognized element: " << node_name << std::endl;
exit(-1);
}
}
}
So basically I ignore and tags, when I'm in a ParametersGroup tag I go deeper, and when I'm in a Parameter tag I recover the datas to do stuff. However, I can't get the "name" properly.
This is the kind of lines I'm scanning in the last else if :
<Parameter name="box">
The std::cout << name displays things like that:
name: ^M
^M
^M
^M
^M
^M
which is obvisouly not what I'm asking for.
What am I doing wrong? Any help would be greatly appreciated.
Since your question isn't particularly selfcontained, here's my selfcontained counter example:
Live On Coliru
#include <sstream>
#include <iostream>
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
using namespace boost::property_tree;
int main() {
ptree pt;
std::istringstream iss("<Parameter name=\"box\" />");
xml_parser::read_xml(iss, pt);
for (auto& element : pt)
{
std::cout << "'" << element.first << "'\n";
for (auto& attr : element.second)
{
std::cout << "'" << attr.first << "'\n";
for (auto& which : attr.second)
{
std::cout << "'" << which.first << "': \"" << which.second.get_value<std::string>() << "\"\n";
}
}
}
}
It prints
'Parameter'
'<xmlattr>'
'name': "box"
I hope you can see what you need to do (likely an unexpected level of nodes in the tree?). To get directly to the leaf node:
pt.get_child("Parameter.<xmlattr>.name").get_value<std::string>()

boost read_xml from stringstream does not read xml format

I want to fill a boost::property_tree::ptree with the data from a xml,
the xml format is in a string which I passed to stringstream and then I try
to read it with read_xml, but the ptree data is null or empty when I look at the object
while debugging, my code:
std::stringstream ss;
ss << "<?xml ?><root><test /></root>";
boost::property_tree::ptree pt;
boost::property_tree::xml_parser::read_xml( ss, pt);
result:
pt {m_data="" m_children=0x001dd3b0 }
before I had a string with this xml code:
<?xml version="1.0"?><Response Location="910" RequesterId="12" SequenceNumber="0">
<Id>1</Id>
<Type>P</Type>
<StatusMessage></StatusMessage>
<Message>Error</Message>
</Response>
But nothing works using visual studio with c++.
There is no data associated with root node so m_data is empty but there is a child node (test) and m_children != nullptr.
Please consider this example:
#include <sstream>
#include <string>
#include <boost/property_tree/xml_parser.hpp>
int main()
{
std::stringstream ss;
ss << "<?xml ?><root><test /></root>";
boost::property_tree::ptree pt;
boost::property_tree::xml_parser::read_xml(ss, pt);
// There is no data associated with root node...
std::string s(pt.get<std::string>("root"));
std::cout << "EXAMPLE1" << std::endl << "Data associated with root node: " << s << std::endl;
// ...but there is a child node.
std::cout << "Children of root node: ";
for (auto r : pt.get_child("root"))
std::cout << r.first << std::endl;
std::cout << std::endl << std::endl;
std::stringstream ss2;
ss2 << "<?xml ?><root>dummy</root>";
boost::property_tree::xml_parser::read_xml(ss2, pt);
// This time we have a string associated with root node
std::string s2(pt.get<std::string>("root"));
std::cout << "EXAMPLE2" << std::endl << "Data associated with root node: " << s2 << std::endl;
return 0;
}
It'll print:
EXAMPLE1
Data associated with root node:
Children of root node: test
EXAMPLE2
Data associated with root node: dummy
(http://coliru.stacked-crooked.com/a/34a99abb0aca78f2).
The Boost propertytree library doesn’t fully document its capabilities, but a good guide for parsing XML with Boost is http://akrzemi1.wordpress.com/2011/07/13/parsing-xml-with-boost/

Using vertex_name when reading a GraphML file with Boost Graph

I am trying to load a simple GraphML file such that each vertex has a vertex name as stored in the GraphML. I can change the GraphML, the important thing is that I have access to the vertex_name from code afterwards.
Here's the most minimal example that I could extract that still shows the problem:
#include <iostream>
#include <string>
#include <fstream>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graphml.hpp>
int main()
{
using namespace boost;
typedef adjacency_list<vecS, vecS, directedS,property<vertex_name_t,std::string> > BoostGraphType;
typedef dynamic_properties BoostDynamicProperties;
std::string fn = "simple.graphml";
std::ifstream is(fn.c_str());
if (!is.is_open())
{
std::cout << "loading file '" << fn << "'failed." << std::endl;
throw "Could not load file.";
}
BoostGraphType g;
BoostDynamicProperties dp ;
const std::string vn = "vertex_name";
dp.property(vn,get(vertex_name,g));
read_graphml(is, g, dp);
for (auto vp = vertices(g); vp.first != vp.second; ++vp.first)
{
std::cout << "index '" << get(vertex_index,g,*vp.first) << "' ";
std::cout << "name '" << get(vertex_name,g,*vp.first) << "'"
<< std::endl;
}
return 0;
}
I am using the the following GraphML test file:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d0" for="node" attr.name="vertex_name" attr.type="string"/>
<graph id="G" edgedefault="directed">
<node id="A"> <data key="d0">A</data> </node>
<node id="B"> <data key="d0">B</data> </node>
<edge id="0" source="A" target="B"/>
</graph>
</graphml>
read_graphml throws an exception with the helpful message (e.what()):
parse error: unrecognized type "
It seems the problem is related to the vertex_name association (which I got from a comment to a previous question of mine).
If I remove
<data key="d0">A</data>
from the node, it works.
However, I need to be able to identify the vertices by vertex_name.
How can I fix this so it properly parses the graphml and does not throw? What am I doing wrong?
Your code works perfectly when I run it.
>wilbert.exe
index '0' name 'A'
index '1' name 'B'
This is using boost v1.52 on windows 7

Parsing XML Attributes with Boost

I would like to share with you an issue I'm having while trying to process some attributes from XML elements in C++ with Boost libraries (version 1.52.0). Given the following code:
#define ATTR_SET ".<xmlattr>"
#define XML_PATH1 "./pets.xml"
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
using namespace std;
using namespace boost;
using namespace boost::property_tree;
const ptree& empty_ptree(){
static ptree t;
return t;
}
int main() {
ptree tree;
read_xml(XML_PATH1, tree);
const ptree & formats = tree.get_child("pets", empty_ptree());
BOOST_FOREACH(const ptree::value_type & f, formats){
string at = f.first + ATTR_SET;
const ptree & attributes = formats.get_child(at, empty_ptree());
cout << "Extracting attributes from " << at << ":" << endl;
BOOST_FOREACH(const ptree::value_type &v, attributes){
cout << "First: " << v.first.data() << " Second: " << v.second.data() << endl;
}
}
}
Let's say I have the following XML structure:
<?xml version="1.0" encoding="utf-8"?>
<pets>
<cat name="Garfield" weight="4Kg">
<somestuff/>
</cat>
<dog name="Milu" weight="7Kg">
<somestuff/>
</dog>
<bird name="Tweety" weight="0.1Kg">
<somestuff/>
</bird>
</pets>
Therefore, the console output I'll get will be the next:
Extracting attributes from cat.<xmlattr>:
First: name Second: Garfield
First: weight Second: 4Kg
Extracting attributes from dog.<xmlattr>:
First: name Second: Milu
First: weight Second: 7Kg
Extracting attributes from bird.<xmlattr>:
First: name Second: Tweety
First: weight Second: 0.1Kg
However, if I decide to use a common structure for every single element laying down from the root node (in order to identify them from their specific attributes), the result will completely change. This may be the XML file in such case:
<?xml version="1.0" encoding="utf-8"?>
<pets>
<pet type="cat" name="Garfield" weight="4Kg">
<somestuff/>
</pet>
<pet type="dog" name="Milu" weight="7Kg">
<somestuff/>
</pet>
<pet type="bird" name="Tweety" weight="0.1Kg">
<somestuff/>
</pet>
</pets>
And the output would be the following:
Extracting attributes from pet.<xmlattr>:
First: type Second: cat
First: name Second: Garfield
First: weight Second: 4Kg
Extracting attributes from pet.<xmlattr>:
First: type Second: cat
First: name Second: Garfield
First: weight Second: 4Kg
Extracting attributes from pet.<xmlattr>:
First: type Second: cat
First: name Second: Garfield
First: weight Second: 4Kg
It seems the number of elements hanging from the root node is being properly recognized since three sets of attributes have been printed. Nevertheless, all of them refer to the attributes of the very first element...
I'm not an expert in C++ and really new to Boost, so this might be something I'm missing with respect to hash mapping processing or so... Any advice will be much appreciated.
The problem with your program is located in this line:
const ptree & attributes = formats.get_child(at, empty_ptree());
With this line you are asking to get the child pet.<xmlattr> from pets and you do this 3 times independently of whichever f you are traversing. Following this article I'd guess that what you need to use is:
const ptree & attributes = f.second.get_child("<xmlattr>", empty_ptree());
The full code, that works with both your xml files, is:
#define ATTR_SET ".<xmlattr>"
#define XML_PATH1 "./pets.xml"
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
using namespace std;
using namespace boost;
using namespace boost::property_tree;
const ptree& empty_ptree(){
static ptree t;
return t;
}
int main() {
ptree tree;
read_xml(XML_PATH1, tree);
const ptree & formats = tree.get_child("pets", empty_ptree());
BOOST_FOREACH(const ptree::value_type & f, formats){
string at = f.first + ATTR_SET;
const ptree & attributes = f.second.get_child("<xmlattr>", empty_ptree());
cout << "Extracting attributes from " << at << ":" << endl;
BOOST_FOREACH(const ptree::value_type &v, attributes){
cout << "First: " << v.first.data() << " Second: " << v.second.data() << endl;
}
}
}
Without ever using this feature so far, I would suspect that boost::property_tree XML parser isn't a common XML parser, but expects a certain schema, where you have exactly one specific tag for one specific property.
You might prefer to use other XML parsers that provides parsing any XML schema, if you want to work with XML beyond the boost::property_tree capabilities. Have a look at e.g. Xerces C++ or Poco XML.
File to be parsed, pets.xml
<pets>
<pet type="cat" name="Garfield" weight="4Kg">
<something name="test" value="*"/>
<something name="demo" value="#"/>
</pet>
<pet type="dog" name="Milu" weight="7Kg">
<something name="test1" value="$"/>
</pet>
<birds type="parrot">
<bird name="african grey parrot"/>
<bird name="amazon parrot"/>
</birds>
</pets>
code:
// DemoPropertyTree.cpp : Defines the entry point for the console application.
//Prerequisite boost library
#include "stdafx.h"
#include <boost/property_tree/xml_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>
#include<iostream>
using namespace std;
using namespace boost;
using namespace boost::property_tree;
void processPet(ptree subtree)
{
BOOST_FOREACH(ptree::value_type petChild,subtree.get_child(""))
{
//processing attributes of element pet
if(petChild.first=="<xmlattr>")
{
BOOST_FOREACH(ptree::value_type petAttr,petChild.second.get_child(""))
{
cout<<petAttr.first<<"="<<petAttr.second.data()<<endl;
}
}
//processing child element of pet(something)
else if(petChild.first=="something")
{
BOOST_FOREACH(ptree::value_type somethingChild,petChild.second.get_child(""))
{
//processing attributes of element something
if(somethingChild.first=="<xmlattr>")
{
BOOST_FOREACH(ptree::value_type somethingAttr,somethingChild.second.get_child(""))
{
cout<<somethingAttr.first<<"="<<somethingAttr.second.data()<<endl;
}
}
}
}
}
}
void processBirds(ptree subtree)
{
BOOST_FOREACH(ptree::value_type birdsChild,subtree.get_child(""))
{
//processing attributes of element birds
if(birdsChild.first=="<xmlattr>")
{
BOOST_FOREACH(ptree::value_type birdsAttr,birdsChild.second.get_child(""))
{
cout<<birdsAttr.first<<"="<<birdsAttr.second.data()<<endl;
}
}
//processing child element of birds(bird)
else if(birdsChild.first=="bird")
{
BOOST_FOREACH(ptree::value_type birdChild,birdsChild.second.get_child(""))
{
//processing attributes of element bird
if(birdChild.first=="<xmlattr>")
{
BOOST_FOREACH(ptree::value_type birdAttr,birdChild.second.get_child(""))
{
cout<<birdAttr.first<<"="<<birdAttr.second.data()<<endl;
}
}
}
}
}
}
int _tmain(int argc, _TCHAR* argv[])
{
const std::string XML_PATH1 = "C:/Users/10871/Desktop/pets.xml";
ptree pt1;
boost::property_tree::read_xml( XML_PATH1, pt1 );
cout<<"********************************************"<<endl;
BOOST_FOREACH( ptree::value_type const& topNodeChild, pt1.get_child( "pets" ) )
{
ptree subtree = topNodeChild.second;
if( topNodeChild.first == "pet" )
{
processPet(subtree);
cout<<"********************************************"<<endl;
}
else if(topNodeChild.first=="birds")
{
processBirds(subtree);
cout<<"********************************************"<<endl;
}
}
getchar();
return 0;
}
The output is shown here:
output