Boost and xml parsing - c++

I have following xml data and i want to parse through boost xml parser.
<?xml version="1.0" encoding="UTF-8"?>
<applications>
<application>
<id>1</id>
<platform>linux-x64</platform>
<version>2.4</version>
</application>
<application>
<id>2</id>
<platform>windows</platform>
<version>2.5</version>
</application>
<application>
<id>3</id>
<platform>linux</platform>
<version>2.6</version>
</application>
</applications>
I have written below boost code but I read only first child of "applications" and not able to read other two childs. Everytime inner loop get the data of first child.
boost::property_tree::ptree pt;
boost::property_tree::read_xml(sModel, pt); // sModel is filename which contains above xml data
BOOST_FOREACH(boost::property_tree::ptree::value_type &v, pt.get_child("applications"))
{
std::string key = v.first.data();
std::string Id, platform, version;
if (key == std::string("application"))
{
BOOST_FOREACH(boost::property_tree::ptree::value_type &v_, pt.get_child("applications.application"))
{
std::string app_key = v_.first.data();
std::string app_value = v_.second.data();
if (app_key == std::string("id"))
pkgId = app_value;
else if (app_key == std::string("platform"))
platform = app_value;
else if (app_key == std::string("version"))
version = app_value;
}
}
}
Here, every time i get the platform as "linux-x64".
Can someone guide how to read all the child through this boost xml ?
Thanks in Advance.

get_child (and all the other path-based access functions) isn't very good at dealing with multiple identical keys. It will choose the first child with the given key and return that, ignoring all others.
But you don't need get_child, because you already hold the node you want in your hand.
pt.get_child("applications") gives you a ptree. Iterating over that gives you a ptree::value_type, which is a std::pair<std::string, ptree>.
The first weird thing, then, is this line:
std::string key = v.first.data();
The data() function you're calling here is std::string::data, not ptree::data. You could just write
std::string key = v.first;
The next strange thing is the comparison:
if (key == std::string("application"))
You don't need to explicitly construct a std::string here. In fact, doing so is a pessimization, because it has to allocate a string buffer and copy the string there, when std::string has comparison operators for C-style strings.
Then you iterator over pt.get_child("applications.application"), but you don't need to do this lookup - v.second is already the tree you want.
Furthermore, you don't need to iterate over the child at all, you can use its lookup functions to get what you need.
std::string pkgId = v.second.get("id", "");
So to sum up, this is the code I would write:
boost::property_tree::ptree pt;
boost::property_tree::read_xml(sModel, pt);
BOOST_FOREACH(boost::property_tree::ptree::value_type &v, pt.get_child("applications"))
{
// You can even omit this check if you can rely on all children
// being application nodes.
if (v.first == "application")
{
std::string pkgId = v.second.get("id", "");
std::string platform = v.second.get("platform", "");
std::string version = v.second.get("version", "");
}
}

Check this example:
#include <boost/property_tree/xml_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>
struct Application
{
int m_id
std::string m_platform;
float m_version;
};
typedef std::vector<Application> AppList;
AppList Read()
{
using boost::property_tree::ptree;
// Populate tree structure (pt):
ptree pt;
read_xml("applications.xml", pt); // For example.
// Traverse pt:
AppList List;
BOOST_FOREACH(ptree::value_type const& v, pt.get_child("applications"))
{
if (v.first == "application")
{
Application App;
App.id = v.second.get<int>("id");
App.platform = v.second.get<std::string>("platform");
App.version = v.second.get<float>("version");
List.push_back(App);
}
}
return List;
}

Related

Quick way to extract the infomation from .xml files to the object

I am starter and right now I am trying to extract the key information from a .xml file then load them to an object of my class, for example:
Here are some information in .xml file:
<row Id="17" Phone="12468" Address="Bos" />
<row Id="242" Phone="98324" Address="Chi" Age="30"/>
<row Id="157" Phone="23268" Age="25" />
<row Id="925" Phone="54325" Address="LA" />
And my class would be:
class worker{
string ID;
string Phone;
string Address;
string Age;
}
I know the infomation would be various and if there is not that infomation of that line, we put ""(empty string) in it as return. And I know the infomation are given in the same order of the fields in class. I try to implement a function, let says extractInfo(const string& line, const string &key)
//#line: the whole line read from .xml
//#key: it would be "Id:"", "Phone:"", "Address:"" or "Age:"", so that I could reach the
// previous index of the infomation that I could extract.
extractInfo(const string& line, const string &key){
int index = line.find(key);
if(index == -1) return "";
int start = index + key.length(); //to reach the start quote
int end = start;
while(line[end] != '"'){ //to reach the end quote
end++;
}
return line.substr(start, end - start);
}
int main(){
...// for each line read from .xml, I build a new object of class worker and filling the field
worker.Id = extraInfo(line, "Id:\"");
worker.Phone = extraInfo(line, "Phone:\"");
...//etc.
...//then work on other manipulation
return 0;
}
My question are, is there any way that I could read and load the infomation from xml much more quickly through other APL or functions? That is, is there any way for me to improve this function when the .xml is a huge file with TBytes? And, is there any way that I can use less memory to, for example, find the oldest worker then print out? I know it's tough for me and I still try hard on it!
Thank all the ideas and advice in advance!
You can parse XML with existing XML parsing libraries, such as rapidxml, libxml2, etc.
Please note that for huge XML, since it need read all XML content to create the DOM tree, so the DOM method is not really suitable. you can use libxml2's xmlreader to parse each node one by one.
libxml2 xml reader
static void
streamFile(const char *filename) {
xmlTextReaderPtr reader;
int ret;
reader = xmlReaderForFile(filename, NULL, 0);
if (reader != NULL) {
ret = xmlTextReaderRead(reader);
while (ret == 1) {
const xmlChar *name = xmlTextReaderConstName(reader);
if(xmlStrEqual(BAD_CAST "row", name)) {
const xmlChar *id = xmlTextReaderGetAttribute(reader, "Id");
const xmlChar *phone = xmlTextReaderGetAttribute(reader, "Phone");
// you code here...
xmlFree(id);
xmlFree(phone);
}
ret = xmlTextReaderRead(reader);
}
xmlFreeTextReader(reader);
if (ret != 0) {
fprintf(stderr, "%s : failed to parse\n", filename);
}
} else {
fprintf(stderr, "Unable to open %s\n", filename);
}
}
And, If your XML format is always like above, you can also use std::regex_search to handle it
https://en.cppreference.com/w/cpp/regex/regex_search
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str = R"(<row Id="17" Phone="12468" Address="Bos" />)";
std::regex regex("(\\w+)=\"(\\w+)\"");
// get all tokens
std::smatch result;
while (std::regex_search(str, result, regex))
{
std::cout << result[1] << ": " << result[2] << std::endl;
str = result.suffix().str();
}
}

C++/RapidXML: Edit node and write to a new XML file doesn't have the updated nodes

I'm parsing a XML file from a string.
My node Id is bar, and I want to change it to foo and then write to file.
After writing to file, the file still have the bar, and not the foo.
#include "rapidxml.hpp"
#include "rapidxml_print.hpp"
void main()
{
std::string newXml = "<?xml version=\"1.0\" encoding=\"UTF - 8\"?><Parent><FileId>fileID</FileId><IniVersion>2.0.0</IniVersion><Child><Id>bar</Id></Child></Parent>";
xml_document<> doc;
xml_node<> * root_node;
std::string str = newXml;
std::vector<char> buffer(str.begin(), str.end());
buffer.push_back('\0');
doc.parse<0>(&buffer[0]);
root_node = doc.first_node("Parent");
xml_node<> * node = root_node->first_node("Child");
xml_node<> * xml = node->first_node("Id");
xml->value("foo"); // I want to change my id from bar to foo!!!!
std::ofstream outFile("output.xml");
outFile << doc; // after I write to file, I still see the ID as bar
}
What am I missing here?
The issue is in the layout of data. Under node_element node xml there is yet another node_data node that contains "bar".
Your posted code also does not compile. Here I made your code to compile and did show how to fix it:
#include <vector>
#include <iostream>
#include "rapidxml.hpp"
#include "rapidxml_print.hpp"
int main()
{
std::string newXml = "<?xml version=\"1.0\" encoding=\"UTF - 8\"?><Parent><FileId>fileID</FileId><IniVersion>2.0.0</IniVersion><Child><Id>bar</Id></Child></Parent>";
rapidxml::xml_document<> doc;
std::string str = newXml;
std::vector<char> buffer(str.begin(), str.end());
buffer.push_back('\0');
doc.parse<0>(&buffer[0]);
rapidxml::xml_node<>* root_node = doc.first_node("Parent");
rapidxml::xml_node<>* node = root_node->first_node("Child");
rapidxml::xml_node<>* xml = node->first_node("Id");
// xml->value("foo"); // does change something that isn't output!!!!
rapidxml::xml_node<> *real_thing = xml->first_node();
if (real_thing != nullptr // these checks just demonstrate that
&& real_thing->next_sibling() == nullptr // it is there and how it is located
&& real_thing->type() == rapidxml::node_data) // when element does contain text data
{
real_thing->value("yuck"); // now that should work
}
std::cout << doc; // lets see it
}
And so it outputs:
<Parent>
<FileId>fileID</FileId>
<IniVersion>2.0.0</IniVersion>
<Child>
<Id>yuck</Id>
</Child>
</Parent>
See? Note that how data is laid out during parse depends on flags that you give to parse. For example if you first put doc.parse<rapidxml::parse_fastest> then parser will not create such node_data nodes and then changing node_element data (like you first tried) will work (and what I did above will not). Read the details from manual.

How to parse nested arrays inside json using C++

I know how to parse "normal" looking JSON data in C++. Usually, I do this, using boost::property_tree and read_json method. It may look like so:
BOOST_FOREAH(ptree::value_type &v, pt.get_child("rows"){
vec.push_back(v.second.get<std::string>("key"));
}
and the code above corresponds to this JSON file:
{
"rows":[{
"key":"1"
},{
"key":"2"
}]
}
However, the Neo4j result-set that I get, looks like:
{
"columns":{...},
"data":[[["object 1"]], [["object 2"]], [["object 3"]]]
}
I'm interested and want to parse "data" node. I tried to do it like so:
BOOST_FOREAH(ptree::value_type &v, pt.get_child("data"){
vec.push_back(v.second.data());
}
but this does not work. I do not get an error, but my vector vec remains empty, or to be more precise it is populated with empty values. So, that when I iterate through this vec I see a number of elements, but they do not have any value. Whereas, I want to have values "object 1", "object 2", "object 3".
The solution looks like this:
using boost::property::ptree;
ptree pt;
//... populate ptree pt with data from some source
BOOST_FOREACH(ptree::value_type &v, pt.get_child('data')){
ptree subtree1 = v.second;
BOOST_FOREACH(ptree::value_type &vs, subtree1){
ptree subtree2 = vs.second;
BOOST_FOREACH(ptree::value_type &vs2, subtree2){
do_something(vs2.second.data());
}
}
}
This code makes it possible to parse such JSON structure:
{
"data":[[["object 1"]], [["object 2"]], [["object 3"]]]
}
So, contrary to what some people are saying, actually, there is no need to use other third-party libraries. Use just boost and you are done.
This is an example of how I do it. You have to know the JSON structure ahead of time.
#include <boost/lexical_cast.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
boost::property_tree::ptree pt, sub_pt;
std::string json_str, key, sub_key;
std::stringstream ss;
int value = 0, bus_num = 0;
json_str = "{\"arduino_1\": {\"bus_1\": 17425,\"bus_2\": 1025,\"bus_3\": 0,\"bus_4\": 0,\"bus_5\": 0,\"bus_6\": 0,\"bus_7\": 0,\"bus_8\": 0}}";
ss << json_str; // put string into stringstream
boost::property_tree::read_json(ss, pt); // put stringstream into property tree
for (boost::property_tree::ptree::iterator iter = pt.begin(); iter != pt.end(); iter++)
{
// get data
key = boost::lexical_cast <std::string>(iter->first.data());
sub_pt = iter->second;
// iterate over subtree
for (boost::property_tree::ptree::iterator sub_iter = sub_pt.begin(); sub_iter != sub_pt.end(); sub_iter++)
{
// get data
sub_key = boost::lexical_cast <std::string>(sub_iter->first.data());
value = boost::lexical_cast <int>(sub_iter->second.data());
}
}

How to read through nodes using pugixml?

I have just downloaded the pugixml library and I am trying to adapt it to my needs. It is mostly oriented for DOM style which I am not using. The data I store looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<profile>
<points>
<point>
<index>0</index>
<x>0</x>
<y>50</y>
</point>
<point>
<index>1</index>
<x>2</x>
<y>49.9583</y>
</point>
<point>
<index>2</index>
<x>12</x>
<y>50.3083</y>
</point>
</points>
</profile>
Pugixml guide says:
It is common to store data as text contents of some node - i.e.
This is a node. In this case,
node does not have a value, but instead has a child of
type node_pcdata with value "This is a node". pugixml provides
child_value() and text() helper functions to parse such data.
But I am having problem with using those methods, I am not getting the node values out.
#include "pugixml.hpp"
#include <string.h>
#include <iostream>
int main()
{
pugi::xml_document doc;
if (!doc.load_file("/home/lukasz/Programy/eclipse_linux_projects/xmlTest/Debug/pidtest.xml"))
return -1;
pugi::xml_node points = doc.child("profile").child("points");
for (pugi::xml_node point = points.first_child(); point; point = points.next_sibling())
{
// ?
}
return 0;
}
How to read out the index, x and y values inside of the for? I Would aprichiate all help.
There are several ways, documented in the quickstart page:
http://pugixml.org/docs/samples/traverse_iter.cpp
http://pugixml.org/docs/samples/traverse_rangefor.cpp
there is a tree visitor for the power jobs http://pugixml.org/docs/samples/traverse_walker.cpp
May I suggest Xpath?
#include <pugixml.hpp>
#include <iostream>
int main()
{
pugi::xml_document doc;
if (doc.load_file("input.txt")) {
for (auto point : doc.select_nodes("//profile/points/point")) {
point.node().print(std::cout, "", pugi::format_raw);
std::cout << "\n";
}
}
}
Prints
<point><index>0</index><x>0</x><y>50</y></point>
<point><index>1</index><x>2</x><y>49.9583</y></point>
<point><index>2</index><x>12</x><y>50.3083</y></point>

How to read Boost property_map with peer values with identical tag names

I am using the property_map from Boost c++ library v1.53, and its working great for me except I can't figure out how to parse data nodes with the same name that are peers of each other. As in the following XML:
<RECORDSET>
<C>
<CY>
<CZ>
<I>1</I>
<CT>10</CT>
</CZ>
<CZ>
<I>2</I>
<CT>12</CT>
</CZ>
</CY>
<CS>
<I>1</I>
<I>2</I>
</CS>
</C>
</RECORDSET>
I can parse everything above except the "I" data node elements under the "CS" tag at the bottom. I am trying to use the code:
// (works no problem)
BOOST_FOREACH(const ptree::value_type & vpairC, proptreeCs.get_child(string("C")))
{
if (vpairC.first.data() != std::string("C"))
continue;
// grab the "C" ptree sub-tree for readability.
ptree proptreeC = vpairC.second;
// (this works no problem to iterate through the multiple CZ nodes under CY)
// RM_CZs
short nCZCount = 0;
sTagName = ;
BOOST_FOREACH(const ptree::value_type & vpairCZ, proptreeC.get_child("C"))
{
// get a local ptree for readability.
ptree ptreeCZ = vpairCZ.second;
// get the I and CT ids.
sTagName = "I";
long lId = ptreeCZ.get<long>(sTagName));
sTagName = "CT";
long lCT = ptreeCZ.get<long>(sTagName));
// do something with id and ct...
// increment the count.
nCZCount++;
}
// nCZCount ends up set to 2 based on input XML above
// (this loop does NOT work)
sTagName = "CS";
const ptree proptreeCS = proptreeC.get_child(sTagName);
// (this does NOT work to iterate through <I> values under the <CS> node)
sTagName = "I";
BOOST_FOREACH(const ptree::value_type & vpairCS,
proptreeCS.get_child(sTagName))
{
// check to see if this is a "I" value; if not skip it.
if (vpairCS.first.data() != sTagName)
continue;
long lId = atol(vpairCS.second.data().c_str());
// do something with id...
}
// the above loop does NOT execute one time.
}
So how can I iterate through the "I" value peers under the "CS" node?
In the code in my question, I was asking for children too low in the tree. Here is the loop that will retrieve the "I" values from the "CS" node (replaces the last BOOST_FOREACH in the code in my question):
BOOST_FOREACH(const ptree::value_type & vpairI, proptreeC.get_child(std::str("CS")))
{
// check to see if this is an "I" value; if not skip it.
if (vpairCapability.first.data() != std::string("I"))
continue;
long lId = atol(vpairI.second.data().c_str());
// do something with lId...
}