Fastest way to parse a XML file with libxml2? - c++

Hi is there any "faster" way to parse a XML file with libxml2?
Right now i do it that way following C++ Code:
void parse_element_names(xmlNode * a_node, int *calls)
{
xmlNode *cur_node = NULL;
for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
(*calls)++;
if(xmlStrEqual(xmlCharStrdup("to"),cur_node->name)){
//printf("node type: <%d>, name <%s>, content: <%s> \n", cur_node->children->type, cur_node->children->name, cur_node->children->content);
//do something with the content
parse_element_names(cur_node->children->children,calls);
}
else if(xmlStrEqual(xmlCharStrdup("from"),cur_node->name)) {
//printf("node type: <%d>, name <%s>, content: <%s> \n", cur_node->children->type, cur_node->children->name, cur_node->children->content);
//do something with the content
parse_element_names(cur_node->children->children,calls);
}
else if(xmlStrEqual(xmlCharStrdup("note"),cur_node->name)) {
//printf("node type: <%d>, name <%s>, content: <%s> \n", cur_node->children->type, cur_node->children->name, cur_node->children->content);
//do something with the content
parse_element_names(cur_node->children->children,calls);
}
.
.
.
//about 100 more node names comming
else{
parse_element_names(cur_node->children,calls);
}
}
}
int main(int argc, char **argv)
{
xmlDoc *doc = NULL;
xmlNode *root_element = NULL;
if (argc != 2)
return(1);
/*parse the file and get the DOM */
doc = xmlReadFile(argv[1], NULL, XML_PARSE_NOBLANKS);
if (doc == NULL) {
printf("error: could not parse file %s\n", argv[1]);
}
int calls = 0;
/*Get the root element node */
root_element = xmlDocGetRootElement(doc);
parse_element_names(root_element,&calls);
/*free the document */
xmlFreeDoc(doc);
xmlCleanupParser();
return 0;
}
Is it really the fastest way? Or is there any better/faster solution which you can advice me?
Thank you

xmlReadFile et al. are based on libxml2's SAX parser interface (actually, the SAX2 interface), so it's generally faster to use your own SAX parser if you don't need the resulting xmlDoc.
If you have to distinguish between many different element names like in your example, the fastest approach is usually to create separate functions for every type of node and use a hash table to lookup these functions.

Related

Quick way to extract the infomation from .xml files to the object

I am starter and right now I am trying to extract the key information from a .xml file then load them to an object of my class, for example:
Here are some information in .xml file:
<row Id="17" Phone="12468" Address="Bos" />
<row Id="242" Phone="98324" Address="Chi" Age="30"/>
<row Id="157" Phone="23268" Age="25" />
<row Id="925" Phone="54325" Address="LA" />
And my class would be:
class worker{
string ID;
string Phone;
string Address;
string Age;
}
I know the infomation would be various and if there is not that infomation of that line, we put ""(empty string) in it as return. And I know the infomation are given in the same order of the fields in class. I try to implement a function, let says extractInfo(const string& line, const string &key)
//#line: the whole line read from .xml
//#key: it would be "Id:"", "Phone:"", "Address:"" or "Age:"", so that I could reach the
// previous index of the infomation that I could extract.
extractInfo(const string& line, const string &key){
int index = line.find(key);
if(index == -1) return "";
int start = index + key.length(); //to reach the start quote
int end = start;
while(line[end] != '"'){ //to reach the end quote
end++;
}
return line.substr(start, end - start);
}
int main(){
...// for each line read from .xml, I build a new object of class worker and filling the field
worker.Id = extraInfo(line, "Id:\"");
worker.Phone = extraInfo(line, "Phone:\"");
...//etc.
...//then work on other manipulation
return 0;
}
My question are, is there any way that I could read and load the infomation from xml much more quickly through other APL or functions? That is, is there any way for me to improve this function when the .xml is a huge file with TBytes? And, is there any way that I can use less memory to, for example, find the oldest worker then print out? I know it's tough for me and I still try hard on it!
Thank all the ideas and advice in advance!
You can parse XML with existing XML parsing libraries, such as rapidxml, libxml2, etc.
Please note that for huge XML, since it need read all XML content to create the DOM tree, so the DOM method is not really suitable. you can use libxml2's xmlreader to parse each node one by one.
libxml2 xml reader
static void
streamFile(const char *filename) {
xmlTextReaderPtr reader;
int ret;
reader = xmlReaderForFile(filename, NULL, 0);
if (reader != NULL) {
ret = xmlTextReaderRead(reader);
while (ret == 1) {
const xmlChar *name = xmlTextReaderConstName(reader);
if(xmlStrEqual(BAD_CAST "row", name)) {
const xmlChar *id = xmlTextReaderGetAttribute(reader, "Id");
const xmlChar *phone = xmlTextReaderGetAttribute(reader, "Phone");
// you code here...
xmlFree(id);
xmlFree(phone);
}
ret = xmlTextReaderRead(reader);
}
xmlFreeTextReader(reader);
if (ret != 0) {
fprintf(stderr, "%s : failed to parse\n", filename);
}
} else {
fprintf(stderr, "Unable to open %s\n", filename);
}
}
And, If your XML format is always like above, you can also use std::regex_search to handle it
https://en.cppreference.com/w/cpp/regex/regex_search
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str = R"(<row Id="17" Phone="12468" Address="Bos" />)";
std::regex regex("(\\w+)=\"(\\w+)\"");
// get all tokens
std::smatch result;
while (std::regex_search(str, result, regex))
{
std::cout << result[1] << ": " << result[2] << std::endl;
str = result.suffix().str();
}
}

How to retrieve node and particular element string from xml file in C++ using libxml2 without using xpath?

How to retrieve text value in c++ using libxml?
XML file:
<?xml version="1.0" encoding="UTF-8"?>
<Help xmlns="http://www.example.org/File" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/File File.xsd ">
<Text>Test</Text>
</Help>
This XmlSoft has a great set of examples for libxml2 demonstrating how to do most common tasks.
xmlDoc *doc = NULL;
xmlNode *root_element = NULL;
if (argc != 2)
return(1);
/*
* this initialize the library and check potential ABI mismatches
* between the version it was compiled for and the actual shared
* library used.
*/
LIBXML_TEST_VERSION
/*parse the file and get the DOM */
doc = xmlReadFile(argv[1], NULL, 0);
if (doc == NULL) {
printf("error: could not parse file %s\n", argv[1]);
}
/*Get the root element node */
root_element = xmlDocGetRootElement(doc);
print_element_names(root_element);
/*free the document */
xmlFreeDoc(doc);
/*
*Free the global variables that may
*have been allocated by the parser.
*/
xmlCleanupParser();
static void
print_element_names(xmlNode * a_node)
{
xmlNode *cur_node = NULL;
for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
if (cur_node->type == XML_ELEMENT_NODE) {
printf("node type: Element, name: %s\n", cur_node->name);
}
print_element_names(cur_node->children);
}
}

tinyxml parsing xml file

I have a xml file like this:
<?xml version="1.0"?>
<ApplicationSettings>
<BeamGeometry
Dimension="2"
Type="fan"
Shape="arc"
LengthFocalPointToISOCenter="558"
LengthISOCenterToDetector="394"
LengthDetectorSeperation="0.98"
LengthModuleSeperation="0.04"
NumberModules="57"
NumberDetectorsPerModule="16"
NumberISOCenterShift="3.25" />
</ApplicationSettings>
And I'd like to use tinyxml retrieving all the values (such as 558) based on the entry name such as (LengthFocalPointToISOCenter). Here is my code, not successful yet.
int SetFanbeamGeometry(const char* filename)
{
int ret = TRUE;
TiXmlDocument doc("E:\\Projects\\iterativeRecon\\ProjectPackage\\ApplicationSettings\\ApplicationSettings.xml");
int LengthFocalPointToISOCenter;
if( doc.LoadFile())
{
TiXmlHandle hDoc(&doc);
TiXmlElement *pRoot, *pParm;
pRoot = doc.FirstChildElement("ApplicationSettings");
if(pRoot)
{
pParm = pRoot->FirstChildElement("BeamGeometry");
int i = 0; // for sorting the entries
while(pParm)
{
pParm = pParm->NextSiblingElement("BeamGeometry");
i++;
}
}
}
else
{
printf("Warning: ApplicationSettings is not loaded!");
ret = FALSE;
}
return ret;
}
I am wondering how can I use tinyxml to do that? Sorry I am a first time user. and it looks confusing to me. Thanks.
There's only one BeamGeometry child element in the snippet you've shown; the information you're trying to access are its attributes - they're not individual elements.
So you need something like this:
// ...
pParm = pRoot->FirstChildElement("BeamGeometry");
if (pParm)
{
const char* pAttr = pParm->Attribute("LengthFocalPointToISOCenter");
if (pAttr)
{
int iLengthFocalPointToISOCenter = strtoul(pAttr, NULL, 10);
// do something with the value
}
}
If you want to loop through all attributes, it's quite simple:
const TiXmlAttribute* pAttr = pParm->FirstAttribute();
while (pAttr)
{
const char* name = pAttr->Name(); // attribute name
const char* value = pAttr->Value(); // attribute value
// do something
pAttr = pAttr->Next();
}

libxml2 xpath parsing, doesn't work as expected

I decided to use libxml2 parser for my qt application and im stuck on xpath expressions. I found an example class and methods, and modified this a bit for my needs. The code
QStringList* LibXml2Reader::XPathParsing(QXmlInputSource input)
{
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QStringList *valList =NULL;
QByteArray arr = input.data().toUtf8(); //convert input data to utf8
int length = arr.length();
const char* data = arr.data();
doc = xmlRecoverMemory(data,length); // build a tree, ignoring the errors
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
if(xpathObj == NULL)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xmlNodeSetPtr nodes = xpathObj->nodesetval;
int size = (nodes) ? nodes->nodeNr : 0;
if(size==0)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
valList = new QStringList();
for (int i = 0; i < size; i++)
{
xmlNodePtr current = nodes->nodeTab[i];
const char* str = (const char*)current->content;
qDebug() << "name: " << QString::fromLocal8Bit((const char*)current->name);
qDebug() << "content: " << QString::fromLocal8Bit((const char*)current->content) << "\r\n";
valList->append(QString::fromLocal8Bit(str));
}
xmlXPathFreeObject(xpathObj);
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return valList;
}
As an example im making a request to http://yandex.ru/ and trying to get the node with class b-domik__nojs which is basically one div.
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
the problem is the expression //[#class='b-domik__nojs'] doesn't work at all. I checked it in firefox xpath ext., and in opera developer tools xpath ext. in there this expression works perfectly.
I also tried to get other nodes with attributes but for some reason xpath for ANY attribute fails. Is there something wrong in my method? Also when i load a tree using xmlRecover, it gives me a lot of parser errors in debug output.
Ok i played a bit with my libxml2 function more and used "//*" expression to get all elements in the document, but! It returns me only the elements in the first children node of the body tag. This is the yandex.ru dom tree
so basically it gets ALL the elements in the first div "div class="b-line b-line_bar", but doesnt look for the other elements in other child nodes of the <body> for some reason.
Why can that happen? Maybe xmlParseMemory doesnt build a full tree for some reason? Is there any possible solution to fix this.
It is really strange that the expression works anywhere, because it is not a valid XPath expression. After the axis specification (//), there should be a nodetest (element name or *) before the predicate (the condition in square brackets).
//*[#class='bdomik__nojs']
Allright it works now, if my mistake was to use xml functions to make html documents into a tree. I used htmlReadMemory and the tree is fully built now. Some code again
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QByteArray arr = input.data().toUtf8();
int length = arr.length();
const char* data = arr.data();
doc = htmlReadMemory(data,length,"",NULL,HTML_PARSE_RECOVER);
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//*[#class='b-domik__nojs']", xpathCtx);
etc.

string parser text adventure

Hello! I am currently working on a text adventure in C++ and could use some help.
What I'm trying to do is let the user input a command like the following:
'go kitchen'
'open door with key'
and make the game react accordingly.
Our teacher gave us the following code (which I have modified) and I'm having difficulty understanding what exactly it is doing and how I can use it to make the game. I modified it so that the user can input strings and it does tokenize the string wonderfully into a verb, object, preposition and object2.
But what I need to do then is somehow compare the input to a list of available commands. This is what I'm having trouble accomplishing at the moment. I am new to programming and need to do this as a homework assignment for my studies. Any help would be much appreciated.
struct command {
char* verb;
char* object;
char* preposition;
char* object2;
};
bool getTokens(char * acInput,
const char token_delimiter,
command * pTargetCommand)
{
char * pCurToken;
pCurToken = strtok (acInput, &token_delimiter);
if (pCurToken == NULL) {
printf("Error: Found no verb");
getchar();
return 1;
}
pTargetCommand->verb = pCurToken;
pCurToken = strtok (NULL, &token_delimiter);
if (pCurToken == NULL) {
printf("Error: Found no object");
getchar();
return 1;
}
pTargetCommand->object = pCurToken;
pCurToken = strtok (NULL, &token_delimiter);
if (pCurToken != NULL) {
pTargetCommand->preposition = pCurToken;
pCurToken = strtok (NULL, &token_delimiter);
if (pCurToken == NULL) {
printf("Error: Found no second object for preposition");
getchar();
return 1;
}
pTargetCommand->object2 = pCurToken;
}
pCurToken = strtok (NULL, &token_delimiter);
if (pCurToken != NULL) {
printf("Error: too many tokens.");
getchar();
return 1;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
char acInput[256];
cin.getline (acInput,256);
command myCommand = { NULL };
int RoomChoice = 0;
printf ("Splitting string \"%s\" into tokens:\n", acInput);
getTokens(acInput, *TOKEN_DELIMITER, &myCommand);
printf ("Verb: %s\n", myCommand.verb);
printf ("object: %s\n", myCommand.object);
printf ("preposition: %s\n", myCommand.preposition);
printf ("object2: %s\n", myCommand.object2);
getchar();
return 0;
}
Without giving too much of your homework assignment away, you'll need to somehow read the list of all available actions into a structure, then compare against that structure.
As a hint, depending on the pattern, that might be a switch() {} statement or a collection like an array.
Consider
switch (myCommand.verb)
Case "go":
In a real-world application, you'd spin up a factory of command objects, then invoke one of those. Here, however, I would suggesting thinking through your control statements.
You cannot do a switch with strings (as you already noted, switch only work with constant numbers)
To do compare strings you can use strcmp, strncmp, or better yet, use String.compare. You should be able to find enough information about them with a Google search.