libxml2 xpath parsing, doesn't work as expected - c++

I decided to use libxml2 parser for my qt application and im stuck on xpath expressions. I found an example class and methods, and modified this a bit for my needs. The code
QStringList* LibXml2Reader::XPathParsing(QXmlInputSource input)
{
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QStringList *valList =NULL;
QByteArray arr = input.data().toUtf8(); //convert input data to utf8
int length = arr.length();
const char* data = arr.data();
doc = xmlRecoverMemory(data,length); // build a tree, ignoring the errors
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
if(xpathObj == NULL)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xmlNodeSetPtr nodes = xpathObj->nodesetval;
int size = (nodes) ? nodes->nodeNr : 0;
if(size==0)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
valList = new QStringList();
for (int i = 0; i < size; i++)
{
xmlNodePtr current = nodes->nodeTab[i];
const char* str = (const char*)current->content;
qDebug() << "name: " << QString::fromLocal8Bit((const char*)current->name);
qDebug() << "content: " << QString::fromLocal8Bit((const char*)current->content) << "\r\n";
valList->append(QString::fromLocal8Bit(str));
}
xmlXPathFreeObject(xpathObj);
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return valList;
}
As an example im making a request to http://yandex.ru/ and trying to get the node with class b-domik__nojs which is basically one div.
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
the problem is the expression //[#class='b-domik__nojs'] doesn't work at all. I checked it in firefox xpath ext., and in opera developer tools xpath ext. in there this expression works perfectly.
I also tried to get other nodes with attributes but for some reason xpath for ANY attribute fails. Is there something wrong in my method? Also when i load a tree using xmlRecover, it gives me a lot of parser errors in debug output.
Ok i played a bit with my libxml2 function more and used "//*" expression to get all elements in the document, but! It returns me only the elements in the first children node of the body tag. This is the yandex.ru dom tree
so basically it gets ALL the elements in the first div "div class="b-line b-line_bar", but doesnt look for the other elements in other child nodes of the <body> for some reason.
Why can that happen? Maybe xmlParseMemory doesnt build a full tree for some reason? Is there any possible solution to fix this.

It is really strange that the expression works anywhere, because it is not a valid XPath expression. After the axis specification (//), there should be a nodetest (element name or *) before the predicate (the condition in square brackets).
//*[#class='bdomik__nojs']

Allright it works now, if my mistake was to use xml functions to make html documents into a tree. I used htmlReadMemory and the tree is fully built now. Some code again
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QByteArray arr = input.data().toUtf8();
int length = arr.length();
const char* data = arr.data();
doc = htmlReadMemory(data,length,"",NULL,HTML_PARSE_RECOVER);
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//*[#class='b-domik__nojs']", xpathCtx);
etc.

Related

What is the alternative to `TiXmlNode::FirstChild(const char *)` in TinyXML-2?

I am updating code that uses the legacy TinyXml library, to use new TinyXML-2 version instead.
While editing, I noticed that the function TiXmlNode::FirstChild(const char *) has no direct replacement in TinyXML-2.
My questions are:
Is there a convenient replacement for the aforementioned function that I missed?
In case there isn't, how should the example code below be updated for TinyXML-2?
// TiXmlElement *element; // assume this was correctly loaded
TiXmlNode *node;
if ((node = element->FirstChild("example")) != nullptr)
{
for (TiXmlElement *walk = node->FirstChildElement();
walk != nullptr;
walk = walk->NextSiblingElement())
{
// ...
}
}
tinyxml2 has
const XMLElement * XMLNode::FirstChildElement (const char *value=0) const
Your code block is much the same:
if (auto example = element -> FirstChildElement ("example")
{
for (auto walk = example -> FirstChildElement();
walk;
walk -> NextSiblingElement())
{
// walk the walk
}
}
Or you might look at my add-on for tinyxml2 with which your snippet would be:
for (auto walk : selection (element, "example/")
{
// walk the walk
}

Why is this Haxe try-catch block still crashing, when using Release mode for C++ target

I have a HaxeFlixel project, that is working OK in Debug mode for misc targets, including flash, neko and windows. But Targeting Windows in Release mode, I'm having an unexpected crash, and surprisingly it's happening inside a try-catch block. Here's the crashing function:
/**
* Will safely scan a parent node's children, search for a child by name, and return it's text.
* #param parent an Fast object that is parent of the `nodeNamed` node
* #param nodeName the node's name or a comma-separated path to the child (will scan recursively)
* #return node's text as String, or null if child is not there
*/
public static function getNodeText(parent:Fast, nodeName:String):String {
try {
var _node : Fast = getNodeNamed(parent, nodeName);
//if (_node == null)
// return null;
// next line will crash if _node is null
var it :Iterator<Xml> = _node.x.iterator();
if ( it == null || !it.hasNext() )
return null;
var v = it.next();
var n = it.next();
if( n != null ) {
if( v.nodeType == Xml.PCData && n.nodeType == Xml.CData && StringTools.trim(v.nodeValue) == "" ) {
var n2 = it.next();
if( n2 == null || (n2.nodeType == Xml.PCData && StringTools.trim(n2.nodeValue) == "" && it.next() == null) )
return n.nodeValue;
}
//does not only have data (has children)
return null;
}
if( v.nodeType != Xml.PCData && v.nodeType != Xml.CData )
//does not have data";
return null;
return v.nodeValue;
}catch (err:Dynamic) {
trace("Failed parsing node Text [" + nodeName+"] " + err );
return null;
}
}
By enabling if (_node == null) return null; line, It's working safely again. By catching errors as Dynamic I thought I was supposed to catch every possible error type! Why is this happening? And why is it appearing in release mode?
My IDE is FlashDevelop, and I'm using HaxeFlixel 3.3.6, lime 0.9.7 and openFL 1.4.0, if that makes any difference
EDIT: I suspect this has to do with how the translated C++ code missed the Dynamic Exception. The equivalent generated C++ code is:
STATIC_HX_DEFINE_DYNAMIC_FUNC2(BaxXML_obj,_getNodeNamed,return )
::String BaxXML_obj::getNodeText( ::haxe::xml::Fast parent,::String nodeName){
HX_STACK_FRAME("bax.utils.BaxXML","getNodeText",0x4a152f07,"bax.utils.BaxXML.getNodeText","bax/utils/BaxXML.hx",56,0xf6e2d3cc)
HX_STACK_ARG(parent,"parent")
HX_STACK_ARG(nodeName,"nodeName")
HX_STACK_LINE(56)
try
{
HX_STACK_CATCHABLE(Dynamic, 0);
{
HX_STACK_LINE(57)
::haxe::xml::Fast _node = ::bax::utils::BaxXML_obj::getNodeNamed(parent,nodeName); HX_STACK_VAR(_node,"_node");
HX_STACK_LINE(63)
Dynamic it = _node->x->iterator(); HX_STACK_VAR(it,"it");
// ... Let's skip the irrelevant code
}
catch(Dynamic __e){
{
HX_STACK_BEGIN_CATCH
Dynamic err = __e;{
HX_STACK_LINE(82)
::String _g5 = ::Std_obj::string(err); HX_STACK_VAR(_g5,"_g5");
HX_STACK_LINE(82)
::String _g6 = (((HX_CSTRING("Failed parsing node Text [") + nodeName) + HX_CSTRING("] ")) + _g5); HX_STACK_VAR(_g6,"_g6");
HX_STACK_LINE(82)
::haxe::Log_obj::trace(_g6,hx::SourceInfo(HX_CSTRING("BaxXML.hx"),82,HX_CSTRING("bax.utils.BaxXML"),HX_CSTRING("getNodeText")));
HX_STACK_LINE(83)
return null();
}
}
}
HX_STACK_LINE(56)
return null();
}
What haxedefs do you have defined?
Adding these to your project.xml might help:
<haxedef name="HXCPP_CHECK_POINTER"/> <!--makes null references cause errors-->
<haxedef name="HXCPP_STACK_LINE" /> <!--if you want line numbers-->
<haxedef name="HXCPP_STACK_TRACE"/> <!--if you want stack traces-->
You might also try the crashdumper library:
https://github.com/larsiusprime/crashdumper
(Crashdumper will turn on HXCPP_CHECK_POINTER by default as part of it's include.xml, and will set up hooks for both hxcpp's errors and openfl/lime's uncaught error events)
I guess this boils down to how C++ handles null-pointer Exceptions. It doesn't!
More info here or here
That seems odd, some questions that may help solving it.
It looks like you are doing quite some assumptions on how the xml looks (doing some manual it.next()), why is that?
Why are you using this big-ass try-catch block?
How does getNodeNamed look, it seems it can return null.
Do you have an example xml to test with?

Issues loading data in from xml using TinyXml2

I am trying to make a function in my application that can load in an object through attributes in an xml file. I would like to use TinyXML2 as I hear it is pretty easy and quick for games.
Currently I have the following xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Level>
<Pulsator starttime="0" type="0" higherradius="100" lowerradius="10" time="60" y="500" x="300" bpm="60"/>
</Level>
Each attribute of the Pulsator is a variable in my Pulsator class. I use the followign function to import my Pulsators and add them to an vector of objects.
void Game::LoadLevel(string filename)
{
tinyxml2::XMLDocument level;
level.LoadFile(filename.c_str());
tinyxml2::XMLNode* root = level.FirstChild();
tinyxml2::XMLNode* childNode = root->FirstChild();
while (childNode)
{
Pulsator* tempPulse = new Pulsator();
float bpm;
float type;
std::string::size_type sz;
tinyxml2::XMLElement* data = childNode->ToElement();
string inputdata = data->Attribute("bpm");
bpm = std::stof(inputdata, &sz);
if (type == 0)
{
tempPulse->type = Obstacle;
tempPulse->SetColor(D2D1::ColorF(D2D1::ColorF::Black));
}
if (type == 1)
{
tempPulse->type = Enemy;
tempPulse->SetColor(D2D1::ColorF(D2D1::ColorF::Red));
}
if (type == 2)
{
tempPulse->type = Score;
tempPulse->SetColor(D2D1::ColorF(D2D1::ColorF::Green));
}
else
{
tempPulse->type = No_Type;
}
objects.push_back(tempPulse);
}
}
Every time I get to the root node, it loads in incorrectly and the childnode becomes null.
Am I using this incorrectly or is there an issue with my XML file?
The code doesn't correctly specify the child it wants. You want the first XMLElement, not the first child. To do that, use this code when you get the childNode:
tinyxml2::XMLElement* childNode = root->FirstChildElement();
And that saves you the cast later. (You don't need, and shouldn't use, the ToElement()).

Attempting to read nodes in Rapid XML resulting in error

I have a class in my program that uses Rapid XML to write data to file. This process works fine. However when I attempt to read the same data, my program will always be halted by internal error catching code, explaining "next sibling returned NULL but attempted to read value anyways".
if (xmlFile.good())
{
vector<char> buffer((istreambuf_iterator<char>(xmlFile)), istreambuf_iterator<char>());
buffer.push_back('\0');
doc.parse<0>(&buffer[0]);
root_node = doc.first_node("CityData");
for(xml_node<> * bound_node = root_node->first_node("Boundaries"); bound_node; bound_node = bound_node->next_sibling())
{
if (bound_node->first_attribute("enabled")->value() != NULL)
{
int enabled = atoi(bound_node->first_attribute("enabled")->value());
if (enabled == 1)
boundaries = true; // Program globals
}
}
if (boundaries)
{
for(xml_node<> * dimen_node = root_node->first_node("Dimensions"); dimen_node; dimen_node = dimen_node->next_sibling())
{
cityDim.x = atoi(dimen_node->first_attribute("x-val")->value()); // Program globals
cityDim.y = atoi(dimen_node->first_attribute("y-val")->value());
}
}
An example of how the data appears in the XML file:
<CityData version="1.0" type="Example">
<Boundaries enabled="1"/>
<Dimensions x-val="1276" y-val="688"/>
If I add a breakpoint before the either loop attempts to reiterate and look at the values, we can see they are read from the first iteration, however the end criteria for the loop appears to be incorrect and upon next_sibling() the error occurs. I cannot understand the issue, as the code for the loop was copied from an example completely unmodified (aside from variable renaming) and appears correct to me (modifying it to node != NULL) does not help.
In the bound_node-loop the variable bound_node first points to <Boundaries enabled="1"> and you are able to read the attribute with the name enabled. After the call to next_sibling(), bound_node points to <Dimensions .../> and the call to first_attribute("enabled") will return a null pointer because this xml-node does not have an attribute with this name and the subsequent call to value() will cause the program to crash.
I do not understand why you are writing a loop over all nodes. If the xml-file looks like this
<CityData version="1.0" type="Example">
<Boundaries enabled="1"/>
<Dimensions x-val="1276" y-val="688"/>
</CityData>
Then you can extract the values like this:
xml_node<> const * bound_node = root_node->first_node("Boundaries");
if (bound_node)
{
xml_attribute<> const * attr_enabled = bound_node->first_attribute("enabled");
if (attr_enabled)
{
int enabled = atoi(attr_enabled->value());
if (enabled == 1)
boundaries = true;
}
}
if (boundaries)
{
xml_node<> const * dimen_node = root_node->first_node("Dimensions");
if (dimen_node)
{
xml_attribute<> const * xval = dimen_node->first_attribute("x-val");
xml_attribute<> const * yval = dimen_node->first_attribute("y-val");
if (xval && yval)
{
cityDim.x = atoi(xval->value());
cityDim.y = atoi(yval->value());
}
}
}
}
You can write some else-clauses to signal errors, of course.

C++ RapidXML get sibling of the same type?

So, in RapidXML, I'm trying to loop through my file to get the data from some tileset nodes:
rapidxml::xml_node<> *root_node = doc.first_node("map");
for(rapidxml::xml_node<> *tileset = root_node->first_node("tileset");
tileset != 0; tileset = tileset->next_sibling("tileset"))
{
// Iteration stuff...
You're probably saying, what's the problem? Well, in RapidXML, the next_sibling() function optionally matches the name:
xml_node<Ch>* next_sibling(const Ch *name=0, std::size_t name_size=0, bool
case_sensitive=true) const;
Gets next sibling node, optionally matching node name. Behaviour is undefined
if node has no parent. Use parent() to test if node has a parent.
Hence, if a node is not found with the name, it'll just return the next sibling regardless. This is a problem in my program, and I just plain don't want the extra iteration. I think this is stupid, but whatever. Is there a way to make it ONLY iterate through my tileset nodes?
"optionally matching node name" - As in the parameter is optional. If you pass a name string, and it is not found you will get a return value of zero.
xml_node<Ch> *next_sibling(const Ch *name = 0, std::size_t name_size = 0, bool case_sensitive = true) const
{
assert(this->m_parent); // Cannot query for siblings if node has no parent
if (name)
{
if (name_size == 0)
name_size = internal::measure(name);
for (xml_node<Ch> *sibling = m_next_sibling; sibling; sibling = sibling->m_next_sibling)
if (internal::compare(sibling->name(), sibling->name_size(), name, name_size, case_sensitive))
return sibling;
return 0;
}
else
return m_next_sibling;
}
I also had this problem and I used this small modification as a workaround, which works as intended.
rapidxml::xml_node<> *root_node = doc.first_node("map");
for(rapidxml::xml_node<> *tileset = root_node->first_node("tileset");
tileset != 0;
tileset = tileset->next_sibling())
{
if(strcmp(tileset->name(), "tileset")!=0)
continue;
//TODO: the usual loop contents
}