Parsing xml and dump in file each <> block - c++

I am trying to parse xml with some simple C++ which has blocks like
<mgrwt event="1">
...
...
...
</mgrwt>
<mgrwt event="2">
...
...
...
</mgrwt>
Now, I have a bash script which acts on each of these blocks - So, my question is, how can I loop inside the xml ( I do not need RapidXML or something similar though) so that to easily dump to a small temp file each block ?
My parser looks like
bool begin_tag = false;
while (getline(in,line))
{
std::string tmp;
tmp=line;
if(line.find("<mgrwt event=")<=line.size()){
cout<<line<<endl;
begin_tag = true;
continue;
}
else if (tmp == "</mgrwt>")
{
begin_tag = false;
}
}
}
thanks
Alex

I would recommend using an XML parser for reading XML files. Checkout expat, POCO XML or other libraries.
If you can't for whatever reason, and the stuff you're reading always looks exactly the same as in your sample with no other formatting variations, you also should use find() to detect the end of the block:
else if(line.find("</mgrwt>")<=line.size())
{
begin_tag = false;
}

Related

Better way to skip comments in a file parsing

At the moment I use
<file>.eachLine { line ->
if (line ==~ /^#.*$/) {
return // skip comments
}
}
Is there an easier way?
Are you trying to separate the test for comments from the rest of the code in your closure?
You could do this, for some File 'f'....
f.filterLine( { it ==~ /^[^#].*/ } ).each { < process non-comments > }

Read XML node with RapidXML

I'm using RapidXML to parse XML files and read nodes content but I don't want to read values inside a node, I need to read the content of specific XML nodes "as XML" not as parsed values.
Example :
<node1>
<a_lot_of_xml>
< .... >
</a_lot_of_xml>
</node1>
I need to get the content of node1 as :
<a_lot_of_xml>
< .... >
</a_lot_of_xml>
What I tired :
I tried something but its not really good in my opinion, its about to put in node1, the path of an other xml file to read, I did like this :
<file1ToRead>MyFile.xml</file1ToRead>
And then my c++ code is the following :
ifstream file(FileToRead);
stringstream buffer; buffer << file.rdbuf();
But the problem is users will have a lot of XML files to maintain and I just want to use one xml file.
I think "a lot of XML files" is a better way, so you have a directory of all xml files, you can read the xml file when you need it, good for performance.
Back to the problem, can use the rapidxml::print function to get the xml format.
bool test_analyze_xml(const std::string& xml_path)
{
try
{
rapidxml::file<> f_doc(xml_path.c_str());
rapidxml::xml_document<> xml_doc;
xml_doc.parse<0>(const_cast<char*>(f_doc.data()));
rapidxml::xml_node<>* node_1 = xml_doc.first_node("node1");
if(node_1 == NULL)
{
return false;
}
rapidxml::xml_node<>* plain_txt = node_1->first_node("a_lot_of_xml");
if (plain_txt == NULL)
{
return false;
}
std::string xml_data;
rapidxml::print(std::back_inserter(xml_data), *plain_txt, rapidxml::print_no_indenting); //the xml_data is XML format.
}
catch (...)
{
return false;
}
return true;
}
I'm unfamiliar with rapidxml, but I have done this with tinyxml2. The trick is to read out node1 and then create a new XMLDoc (using tinyxml2 terms here) that contains everything inside of node1. From there, you can use their XMLPrinter class to convert your new XMLDoc (containing everything in node1) to a string.
tinyxml2 is a free download.

Reading xml file using QDomDocument just get the first line

I generate a xml file using QXmlStreamWriter. The file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<pedestrianinfo>
<pedestrian uuid="2112e2ed-fc9b-41e8-bbcb-b44ad78bde11">
<module>11.1208</module>
<direction>4</direction>
<row>5</row>
<column>71</column>
</pedestrian>
<pedestrian uuid="1aabb9c1-4aa7-4f47-9542-36d2dfaa26e4">
<module>1.48032</module>
<direction>4</direction>
<row>67</row>
<column>31</column>
</pedestrian>
...
</pedestrianinfo>
Then I try to read the content by QDomDocument. My code looks like this:
xmlReader *xp = new xmlReader(QString("D:\\0T.xml"));
if(xp->openFile()) {
if(xp->isGetRootIndex()) {
xp->parseRootIndexElement();
}
else
cout<<"Unable to get root index."<<endl;
}
Here is isGetRootIndex():
bool xmlReader::isGetRootIndex()
{
doc.setContent(&file,false);
root = doc.documentElement();
if(root.tagName() == getRootIndex()) //rootIndex=="pedestrianinfo"
return true;
return false;
}
This is parseRootIndexElement():
void xmlReader::parseRootIndexElement()
{
QDomNode child = root.firstChild();
while(!child.isNull()) {
if(child.toElement().tagName() == getTagNameP()) //"childTagName=="pedestrian"
parseEntryElement(child.toElement());
qDebug()<<"module="<<module<<" direction="<<direction<<" row="<<row<<" column="<<column;
child = child.nextSibling();
}
}
parseEntryElement(const QDomElement &element) is a function to get the infomation in each tag and save them into variables such as module.
However, each time I run my code, only the first child of xml file could be qDebug*ed*. It seems that after executing child.nextSibling(), child becomes null. Why does it not get the next pedestrian info?
Looks correct to me based on what I see in the documentation. Perhaps parseEntryElement is advancing the iterator unexpectedly?

Parsing XML Elements using TinyXML

UPDATE: Still not working :( I have updated the code portion to reflect what I currently have.
This should be a pretty easy question for people who have used TinyXML. I'm attempting to use TinyXML to parse through an XML document and pull out some values. I figured out how to add in the library yesterday, and I have successfully loaded the document (hey, it's a start).
I've been reading through the manual and I can't quite figure out how to pull out individual attributes. After Googling around, I haven't found an example of my specific example, so perhaps someone here who has used TinyXML can help out. Below is a slice of the XML, and where I have started to parse it.
XML:
<EGCs xmlns="http://tempuri.org/XMLSchema.xsd">
<card type="EGC1">
<offsets>
[ ... ]
</offsets>
</card>
<card type="EGC2">
<offsets>
[ ... ]
</offsets>
</card>
</EGCs>
Loading/parsing code:
TiXmlDocument doc("EGC_Cards.xml");
if(doc.LoadFile())
{
TiXmlHandle hDoc(&doc);
TiXmlElement* pElem;
TiXmlHandle hRoot(0);
pElem = hDoc.FirstChildElement().Element();
if (!pElem) return false;
hRoot = TiXmlHandle(pElem);
//const char *attribval = hRoot.FirstChild("card").ToElement()->Attribute("card");
pElem = hDoc.FirstChild("EGCs").Child("card", 1).ToElement();
if(pElem)
{
const char* tmp = pElem->GetText();
CComboBox *combo = (CComboBox*)GetDlgItem(IDC_EGC_CARD_TYPE);
combo->AddString(tmp);
}
}
I want to pull out each card "type" and save it to a string to put into a combobox. How do I access this attribute member?
After a lot of playing around with the code, here is the solution! (With help from HERE)
TiXmlDocument doc("EGC_Cards.xml");
combo = (CComboBox*)GetDlgItem(IDC_EGC_CARD_TYPE);
if(doc.LoadFile())
{
TiXmlHandle hDoc(&doc);
TiXmlElement *pRoot, *pParm;
pRoot = doc.FirstChildElement("EGCs");
if(pRoot)
{
pParm = pRoot->FirstChildElement("card");
int i = 0; // for sorting the entries
while(pParm)
{
combo->InsertString(i, pParm->Attribute("type"));
pParm = pParm->NextSiblingElement("card");
i++;
}
}
}
else
{
AfxMessageBox("Could not load XML File.");
return false;
}
there should be a Attribute method that takes and attribut name as parameter see: http://www.grinninglizard.com/tinyxmldocs/classTiXmlElement.html
from the documentation I see the code would look like:
hRoot.FirstChildElement("card").ToElement()->Attibute("type");
However for the type of thing you are doing I would use XPATH if at all possible. I have never used it but the TinyXPath project may be helpful if you choose to go that route the link is: http://tinyxpath.sourceforge.net/
Hope this helps.
The documentation I am using to help you from is found at: http://www.grinninglizard.com/tinyxmldocs/hierarchy.html
What you need is to get the attribute type from the element card. So in your code it should be something like:
const char * attribval = hRoot.FirstChild("card").ToElement()->Attribute("card");

Parsing <multi_path literal="not_measured"/> in TinyXML

How do I parse the following in TinyXML:
<multi_path literal="not_measured"/>
I am able to easily parse the below line:
<hello>1234</hello>
The problem is that the first statement is not getting parsed the normal way. Please suggest how to go about this.
Not 100% sure what youre question is asking but here is a basic format too loop through XML files using tinyXML:
/*XML format typically goes like this:
<Value atribute = 'attributeName' >
Text
</value>
*/
TiXmlDocument doc("document.xml");
bool loadOkay = doc.LoadFile(); // Error checking in case file is missing
if(loadOkay)
{
TiXmlElement *pRoot = doc.RootElement();
TiXmlElement *element = pRoot->FirstChildElement();
while(element)
{
string value = firstChild->Value(); //Gets the Value
string attribute = firstChild->Attribute("attribute"); //Gets the attribute
string text = firstChild->GetText(); //Gets the text
element = element->NextSiblingElement();
}
}
else
{
//Error conditions
}