given XML like:
<a>
<result>0</result>
<data>I9C3J9N3cCTZdKGK+itJW1Q==</data>
</a>
I need to get the fact that <result> is 0 and act upon it.
I am doing:
TiXmlDocument doc;
bool bOK = doc.Parse((const char*)chunk.memory, 0, TIXML_ENCODING_UTF8);
if (bOK)
{
TiXmlHandle hDoc(&doc);
TiXmlElement *pRoot, *pParm, *pParm2;
pRoot = doc.FirstChildElement("a");
if(pRoot)
{
pParm = pRoot->FirstChildElement("result");
if (pParm)
{
if (pParm->GetText()=="0")
{
pParm2 = pRoot->NextSiblingElement("data");
if (pParm2)
{
sValue = pParm2->GetText();
std::cout << "sValue: " << sValue << std::endl;
}
}
}
}
}
I thought that GetText() was the right answer, but I am doing something wrong because I never get inside the if to check the <data> element.
Can anyone shed some light for me?
Because in your case, <data> isn't Sibling of <a>.
You're checking pRoot->NextSiblingElement("data") while you should check for pParm->NextSiblingElement("data");
You could also change it to
pParm2 = pRoot->FirstChildElement("data");
Edit:
Sorry, i thought you were referring to this if:
if (pParm2)
So, the solution could be this:
if (std::string(pParm->GetText())=="0")
or
if (strcmp(pParm->GetText(), "0"))
You choose. I prefer the first one.
Edit 2:
I'm really sorry, I was forgetting that strcmp return the first position of where the 2 strings are the same, so, in your case it should be:
if (strcmp(pParm->GetText(), "0") == 0)
You need to include <string.h> too.
Related
I am starter and right now I am trying to extract the key information from a .xml file then load them to an object of my class, for example:
Here are some information in .xml file:
<row Id="17" Phone="12468" Address="Bos" />
<row Id="242" Phone="98324" Address="Chi" Age="30"/>
<row Id="157" Phone="23268" Age="25" />
<row Id="925" Phone="54325" Address="LA" />
And my class would be:
class worker{
string ID;
string Phone;
string Address;
string Age;
}
I know the infomation would be various and if there is not that infomation of that line, we put ""(empty string) in it as return. And I know the infomation are given in the same order of the fields in class. I try to implement a function, let says extractInfo(const string& line, const string &key)
//#line: the whole line read from .xml
//#key: it would be "Id:"", "Phone:"", "Address:"" or "Age:"", so that I could reach the
// previous index of the infomation that I could extract.
extractInfo(const string& line, const string &key){
int index = line.find(key);
if(index == -1) return "";
int start = index + key.length(); //to reach the start quote
int end = start;
while(line[end] != '"'){ //to reach the end quote
end++;
}
return line.substr(start, end - start);
}
int main(){
...// for each line read from .xml, I build a new object of class worker and filling the field
worker.Id = extraInfo(line, "Id:\"");
worker.Phone = extraInfo(line, "Phone:\"");
...//etc.
...//then work on other manipulation
return 0;
}
My question are, is there any way that I could read and load the infomation from xml much more quickly through other APL or functions? That is, is there any way for me to improve this function when the .xml is a huge file with TBytes? And, is there any way that I can use less memory to, for example, find the oldest worker then print out? I know it's tough for me and I still try hard on it!
Thank all the ideas and advice in advance!
You can parse XML with existing XML parsing libraries, such as rapidxml, libxml2, etc.
Please note that for huge XML, since it need read all XML content to create the DOM tree, so the DOM method is not really suitable. you can use libxml2's xmlreader to parse each node one by one.
libxml2 xml reader
static void
streamFile(const char *filename) {
xmlTextReaderPtr reader;
int ret;
reader = xmlReaderForFile(filename, NULL, 0);
if (reader != NULL) {
ret = xmlTextReaderRead(reader);
while (ret == 1) {
const xmlChar *name = xmlTextReaderConstName(reader);
if(xmlStrEqual(BAD_CAST "row", name)) {
const xmlChar *id = xmlTextReaderGetAttribute(reader, "Id");
const xmlChar *phone = xmlTextReaderGetAttribute(reader, "Phone");
// you code here...
xmlFree(id);
xmlFree(phone);
}
ret = xmlTextReaderRead(reader);
}
xmlFreeTextReader(reader);
if (ret != 0) {
fprintf(stderr, "%s : failed to parse\n", filename);
}
} else {
fprintf(stderr, "Unable to open %s\n", filename);
}
}
And, If your XML format is always like above, you can also use std::regex_search to handle it
https://en.cppreference.com/w/cpp/regex/regex_search
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str = R"(<row Id="17" Phone="12468" Address="Bos" />)";
std::regex regex("(\\w+)=\"(\\w+)\"");
// get all tokens
std::smatch result;
while (std::regex_search(str, result, regex))
{
std::cout << result[1] << ": " << result[2] << std::endl;
str = result.suffix().str();
}
}
So the question explains the problem...
Background:
I'm trying to solve this problem from HackerRank.
It's basically an html tag parser. Valid input guaranteed, attributes are strings only.
My Approach
I created a custom Tag class that can store a map<string,Tag> of other Tag's, as well as a map<string,string> of attributes. The parsing seems to be working correctly.
The Problem
During the querying part, I get a BAD_ACCESS error on the following query/html combo:
4 1
<a value = "GoodVal">
<b value = "BadVal" size = "10">
</b>
</a>
a.b~size
The error occurs when I try to access the b Tag from a. Specifically, it's in the t=t.tags[tag_name], Line 118 below.
Code
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <map>
#include <stack>
using namespace std;
class Tag {
public:
Tag(){};
Tag(string name):name(name){};
string name;
map<string,Tag> tags = map<string, Tag>();
map<string,string> attribs=map<string,string>();
};
int main() {
int lines, queries;
std::cin>>lines>>queries;
std:string str;
getline(cin, str);
stack<string> open;
auto tags = map<string, Tag>();
for (int i = 0; i < lines; i++) {
getline(cin, str);
if (str.length()>1){
// If it's not </tag>, then it's an opening tag
if (str[1] != '/') {
// Parse tag name
auto wordidx = str.find(" ");
if (wordidx == -1) {
wordidx = str.length()-1.f;
}
string name = str.substr(1,wordidx-1);
auto t = Tag(name);
string sub = str.substr(wordidx);
auto equalidx=sub.find("=");
// Parse Attributes
while (equalidx != std::string::npos) {
string key = sub.substr(1,equalidx-2);
sub = sub.substr(equalidx);
auto attrib_start = sub.find("\"");
sub = sub.substr(attrib_start+1);
auto attrib_end = sub.find("\"");
string val = sub.substr(0, attrib_end);
sub = sub.substr(attrib_end+1);
t.attribs[key] = val;
equalidx=sub.find("=");
}
// If we're in a tag, push to that, else push to the base tags
if (open.size() == 0) {
tags[name] = t;
} else {
tags[open.top()].tags[name]=t;
}
open.push(name);
} else {
// Pop the stack if we reached a closing tag
auto wordidx = str.find(">");
string name = str.substr(2,wordidx-2);
// Sanity check, but we're assuming valid input
if (name.compare(open.top())) {
cout<<"FUCK"<<name<<open.top()<<endl;
return 9;
}
open.pop();
}
} else {
std::cout<<"FUCK\n";
}
}
//
// Parse in queries
//
for (int i = 0; i < queries; i++) {
getline(cin, str);
Tag t = Tag();
bool defined = false;
auto next_dot = str.find(".");
while (next_dot!=string::npos) {
string name = str.substr(0,next_dot);
if (defined && t.tags.find(name) == t.tags.end()) {
//TAG NOT IN T
cout<<"Not Found!"<<endl;
continue;
}
t = !defined ? tags[name] : t.tags[name];
defined = true;
str = str.substr(next_dot+1);
next_dot = str.find(".");
}
auto splitter = str.find("~");
string tag_name = str.substr(0,splitter);
string attrib_name = str.substr(splitter+1);
if (!defined) {
t = tags[tag_name];
} else if (t.tags.find(tag_name) == t.tags.end()) {
//TAG NOT IN T
cout<<"Not Found!"<<endl;
continue;
} else {
t = t.tags[tag_name];
}
// T is now set, check the attribute
if (t.attribs.find(attrib_name) == t.attribs.end()) {
cout<<"Not Found!"<<endl;
} else {
cout<<t.attribs[attrib_name]<<endl;
}
}
return 0;
}
What I've tried
This is fixed by just defining Tag x = t.tags[tag_name]; in the line above as a new variable, and then doing t = x; but why is this even happening?
Also, the following query also then fails: a.b.c~height, but it fails on Line 99 when it tried to get a.tags["b"]. No idea why. I was gonna just go with the hacky fix above, but this seems like a big core issue that i'm doing wrong.
I would suggest running this on an IDE and verifying that the parsing is indeed correct.
t=t.tags[tag_name]
This expression is unsafe because you are copy-assigning an object that is owned by that object over the owning object.
Consider what happens on this line:
The map lookup is performed and returns a Tag&.
You try to copy-assign this to t, invoking the implicit copy-assigment operator.
This operator copy-assigns t.tags from the tags attribute of the copy source -- which lives in t.tags.
The result is that the object you're copying into t is destroyed in the middle of that copy. This causes undefined behavior, and an immediate crash is honestly the best possible outcome as it told you exactly where the problem was. (This kind of problem frequently manifests at some point later in the program, at which point you've lost the state necessary to figure out what caused the UB.)
One workaround would be to move the source object into a temporary and then move-assign that temporary over t:
t = Tag{std::move(t.tags[tag_name])};
This lifts the data we want to assign to t out of t before we try to put it in t. Then, when t's assignment operator goes to replace t.tags, the data you're trying to assign to t doesn't live there anymore.
However, this overall approach involves a lot of unnecessary copying. It would be better to declare t as Tag const *t; instead -- have it be a pointer to a tag. Then you can just move that pointer around to point at other tags in your data structure without making copies.
Side note: I just did this problem the other day! Here's a hint that might help you simplify things: do you actually need a structure of tags? Is there a simpler type of lookup structure that would work instead of nested tags?
I do not understand why the below code doesn't work as intended to copy the one element from doc1 to doc2:
void test_xml(){
using namespace tinyxml2;
XMLDocument doc1, doc2;
XMLPrinter printer;
doc1.LoadFile("./res/xml/basic.xml");
if(doc1.Error())
std::cout << doc1.ErrorName();
doc1.Print(&printer);
std::cout << printer.CStr(); // prints "</atag>" without quotes
printer.ClearBuffer();
doc2.InsertFirstChild(doc1.RootElement());
if(doc2.Error())
std::cout << doc2.ErrorName(); // doesn't run, there's no error
doc2.Print(&printer);
std::cout << printer.CStr(); // prints nothing, no child got inserted to doc2
std::cout << doc2.NoChildren(); //prints "1" meaning we didn't insert anything
}
Can someone point out how this can be improved?
From the TinyXml2 documentation:
InsertFirstChild ... Returns the addThis argument or 0 if the node does not belong to the same document.
Basically you can only add a node to a document if the node was manufactured by that document (with NewElement, NewText etc).
You have to walk through doc1 creating corresponding nodes as you go (with ShallowClone, and adding them to doc2. It appears there is no DeepClone to do it all for you.
At http://sourceforge.net/p/tinyxml/discussion/42748/thread/820b0377/, "practisevoodoo" suggests:
XMLNode *deepCopy( XMLNode *src, XMLDocument *destDoc )
{
XMLNode *current = src->ShallowClone( destDoc );
for( XMLNode *child=src->FirstChild(); child; child=child->NextSibling() )
{
current->InsertEndChild( deepCopy( child, destDoc ) );
}
return current;
}
I have a class in my program that uses Rapid XML to write data to file. This process works fine. However when I attempt to read the same data, my program will always be halted by internal error catching code, explaining "next sibling returned NULL but attempted to read value anyways".
if (xmlFile.good())
{
vector<char> buffer((istreambuf_iterator<char>(xmlFile)), istreambuf_iterator<char>());
buffer.push_back('\0');
doc.parse<0>(&buffer[0]);
root_node = doc.first_node("CityData");
for(xml_node<> * bound_node = root_node->first_node("Boundaries"); bound_node; bound_node = bound_node->next_sibling())
{
if (bound_node->first_attribute("enabled")->value() != NULL)
{
int enabled = atoi(bound_node->first_attribute("enabled")->value());
if (enabled == 1)
boundaries = true; // Program globals
}
}
if (boundaries)
{
for(xml_node<> * dimen_node = root_node->first_node("Dimensions"); dimen_node; dimen_node = dimen_node->next_sibling())
{
cityDim.x = atoi(dimen_node->first_attribute("x-val")->value()); // Program globals
cityDim.y = atoi(dimen_node->first_attribute("y-val")->value());
}
}
An example of how the data appears in the XML file:
<CityData version="1.0" type="Example">
<Boundaries enabled="1"/>
<Dimensions x-val="1276" y-val="688"/>
If I add a breakpoint before the either loop attempts to reiterate and look at the values, we can see they are read from the first iteration, however the end criteria for the loop appears to be incorrect and upon next_sibling() the error occurs. I cannot understand the issue, as the code for the loop was copied from an example completely unmodified (aside from variable renaming) and appears correct to me (modifying it to node != NULL) does not help.
In the bound_node-loop the variable bound_node first points to <Boundaries enabled="1"> and you are able to read the attribute with the name enabled. After the call to next_sibling(), bound_node points to <Dimensions .../> and the call to first_attribute("enabled") will return a null pointer because this xml-node does not have an attribute with this name and the subsequent call to value() will cause the program to crash.
I do not understand why you are writing a loop over all nodes. If the xml-file looks like this
<CityData version="1.0" type="Example">
<Boundaries enabled="1"/>
<Dimensions x-val="1276" y-val="688"/>
</CityData>
Then you can extract the values like this:
xml_node<> const * bound_node = root_node->first_node("Boundaries");
if (bound_node)
{
xml_attribute<> const * attr_enabled = bound_node->first_attribute("enabled");
if (attr_enabled)
{
int enabled = atoi(attr_enabled->value());
if (enabled == 1)
boundaries = true;
}
}
if (boundaries)
{
xml_node<> const * dimen_node = root_node->first_node("Dimensions");
if (dimen_node)
{
xml_attribute<> const * xval = dimen_node->first_attribute("x-val");
xml_attribute<> const * yval = dimen_node->first_attribute("y-val");
if (xval && yval)
{
cityDim.x = atoi(xval->value());
cityDim.y = atoi(yval->value());
}
}
}
}
You can write some else-clauses to signal errors, of course.
I decided to use libxml2 parser for my qt application and im stuck on xpath expressions. I found an example class and methods, and modified this a bit for my needs. The code
QStringList* LibXml2Reader::XPathParsing(QXmlInputSource input)
{
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QStringList *valList =NULL;
QByteArray arr = input.data().toUtf8(); //convert input data to utf8
int length = arr.length();
const char* data = arr.data();
doc = xmlRecoverMemory(data,length); // build a tree, ignoring the errors
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
if(xpathObj == NULL)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xmlNodeSetPtr nodes = xpathObj->nodesetval;
int size = (nodes) ? nodes->nodeNr : 0;
if(size==0)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
valList = new QStringList();
for (int i = 0; i < size; i++)
{
xmlNodePtr current = nodes->nodeTab[i];
const char* str = (const char*)current->content;
qDebug() << "name: " << QString::fromLocal8Bit((const char*)current->name);
qDebug() << "content: " << QString::fromLocal8Bit((const char*)current->content) << "\r\n";
valList->append(QString::fromLocal8Bit(str));
}
xmlXPathFreeObject(xpathObj);
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return valList;
}
As an example im making a request to http://yandex.ru/ and trying to get the node with class b-domik__nojs which is basically one div.
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
the problem is the expression //[#class='b-domik__nojs'] doesn't work at all. I checked it in firefox xpath ext., and in opera developer tools xpath ext. in there this expression works perfectly.
I also tried to get other nodes with attributes but for some reason xpath for ANY attribute fails. Is there something wrong in my method? Also when i load a tree using xmlRecover, it gives me a lot of parser errors in debug output.
Ok i played a bit with my libxml2 function more and used "//*" expression to get all elements in the document, but! It returns me only the elements in the first children node of the body tag. This is the yandex.ru dom tree
so basically it gets ALL the elements in the first div "div class="b-line b-line_bar", but doesnt look for the other elements in other child nodes of the <body> for some reason.
Why can that happen? Maybe xmlParseMemory doesnt build a full tree for some reason? Is there any possible solution to fix this.
It is really strange that the expression works anywhere, because it is not a valid XPath expression. After the axis specification (//), there should be a nodetest (element name or *) before the predicate (the condition in square brackets).
//*[#class='bdomik__nojs']
Allright it works now, if my mistake was to use xml functions to make html documents into a tree. I used htmlReadMemory and the tree is fully built now. Some code again
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QByteArray arr = input.data().toUtf8();
int length = arr.length();
const char* data = arr.data();
doc = htmlReadMemory(data,length,"",NULL,HTML_PARSE_RECOVER);
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//*[#class='b-domik__nojs']", xpathCtx);
etc.