I have a xml reader in C++ and I am making a error function or proofer that only sends complete xml trees to the parser. The data is in a char array like
char chunkdata[245];
Then convert it to a string like
String data(chunkdata);
And parse the data.
This program will get chunked data at any time and process. The only thing with chunked data is that it sometimes sends incomplete xml trees... So I might only get half of a content in a char array like
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to> Tove</to>
<from>Jani</from>
<heading>Remin
And get a few mil seconds later the rest
der</heading>
<body>Don't forget me this weekend!</body>
</note>
And after processing it would produce two strings and crash the program.
What could I add in my code to either wait to add if not complete... or get only the complete xml trees and leave the remaining to add to the rest when it comes... I tried things like string FIND with string substring which would process then Add the remaining later but it didn't work.. Any suggestions ??? Thank you
If the only thing you're doing is a validator that reads the file in block mode, then you should probably keep track of opened and closed tags in some sort of separate structure. If your buffer can change its length (I'm not sure what String is, but std::string certainly can change its size during runtime), you probably want to have something similar to the following:
std::map<std::string, long> tags;
And when you encounter a tag opening, do:
if(tags.find(tagName))
tags[tagName]++;
else
tags[tagName]=1;
And when you encounter a tag closing, do:
if(tags.find(tagName))
tags[tagName]--;
else
tags[tagName]=-1;
The tags are closed properly only if all the elements of the map are equal 0. Lets assume testForCorrectness() does just that. Then your code would look like this:
char chunkdata[245];
readSomeData();
String data(chunkdata);
while(!testforCorrectness()){
readSomeData();
data += (String)chunkdata;
}
return chunkdata;
If you also want to test if the tags were closed in the correct order - try using a vector instead:
std::vector<std::string> openedTags;
On tag begin:
openedTags.push_back(tagName);
On tag close:
if(openedTags.back() == tagName)
openedTags.pop_back();
else
// XML is ill-formed
Finish if empty(openedTags).
Related
I am using xerces-c to parse an XML file but I am getting some strange results.
I create my own DocumentHandler (derived from HandlerBase) and override:
void characters(const XMLCh* const chars, const unsigned int length);
this way I receive notification of character data inside an element.
To parse a file I create a parser, create an inputbuffer, create my handler and call parse.
SAXParser* lp_parser = new SAXParser();
XMLCh* lp_fileName = XMLString::transcode("myfile.xml");
LocalFileInputSource l_fileBuf(lp_fileName);
XMLString::release(&lp_fileName);
MyHandler l_handler;
lp_parser->setDocumentHandler((DocumentHandler *)&l_handler);
lp_parser->parse(l_fileBuf);
delete lp_parser;
The problem is that characters([...]) is not only being called with character data, but also (sometimes several times) for each tag it is called giving me a set of spaces and a newline as character data.
i.e. <Tag>Value</Tag> yields two calls to characters([...]), one where the data is 'Value' and another (or multiple ones) where the data is something like ' \n '
The xml file itself doesn't contain these characters. I have user xerces-c to parse XML like this many times without any problems, although this is the first time I use a LocalFileInputSource (I usually use a MemBufInputSource).
Any ideas?
I had a similar problem with SAX2XMLReader. What I understood is that with SAX parsers it is up to the developer to know where he is in the XML structure while parsing.
It is possible that these subsequent call to characters() are for other tags in the file or ignorable whitespaces.
Depending on the length of the data it is also possible that callback characters be called several times for the same tag. And it is up to you to concatenate the data you receive on each call.
So what I would do is detect the start and end of tag <Tag> with callback functions startElement() and endElement(). In this way you can discard subsequent call to characters() once you have received the endElement() for your tag.
I have a simple xml file which I need to read in c++. I am working on Windows so I choose MSXML. And it wouldn't be a problem if not for the way of how data is saved in the xml file. I cannot modify files as I have a lot of them + I can get a lot more in future. So the part that interest me the most in the xml file is:
<data>
<sample cost="2.000000000000000e+01">1</sample>
</data>
In the beginning of the xml I have specified a precision of the number and how much digits can be ignored.
So far by doing:
MSXML::IXMLDOMNodeListPtr temp = xmlDoc->selectNodes("data/*");
temp->Getitem(0)->Getxml();
gives me whole line as a string also:
temp->Getitem(0)->Gettext();
gives me a number between sections (in this case it is 1) but as a string. I dont know how to get access to a number in <> without manualy getting it from string returned by Getxml().
Getting this numbers manualy from string and converting them to double and int isnt a problem but I want to know if there is a way to directly get access to this numbers in double and int format.
I try to read a value out of an xml-file.
I get the right line into a char[255] using fgets.
When I try to extract the nodes content, the corresponding value doesn't get changed.
XML:
\t<name>A fancy Name</name>
C++:
...
char buff[255];
fgets(buff,255,Filestream);
scanf(buff,"\t<name>%[^<]</name>",&(Daten.name));
Daten.name is an UnicodeString (used by Embarcadero's c++-Builder) by the way.
But Daten.name remains unchanged...
I also tried not using the pointer to Daten.name but the variable directly, but it doesn't seem to change anything...
Can you help me find my mistake here?
I'm trying to scan a text file with XML, the XML has a number of items with this structure:
<enemy>
<type> 0 </type>
<x> 273 </x>
<y> 275 </y>
<event> </event>
</enemy>
The problem is that the xml may have spaces between tags or inside them. I created a loop and I'm trying to do a single scan in each iteration to get int type, x, y and event into a variable each. However I don't know how to ignore whitespaces nor how to handle missing values since some tags may or may not have a value (like event).
How can I scan this "enemy" regadless of spacing and missing values?
That's an easy one - you do not parse XML using fscanf(). Use a real XML parser otherwise you will end up with a very complicated code that will not work 80% of the time either returning wrong data or crashing.
XML format (despite seeming simplicity) is complicated even in most innocuous cases and existing XML parsers are there for a reason. See libxml or a lot of others.
Still, if you are hell-bent on parsing XML yourself, the right way to do it is to first tokenize the input and then ensure that your token sequences result in correct forms. That's way more complicated than using simple fscanf().
const XMLDataNode *pointsNode = node->GetChildren().at(0);
std::wistringstream pointsstrm(*pointsNode->GetInnerText());
pointsstrm >> loadedGame.points;
This is code I've written to pull an int from an XML file and pass it into loadedGame.points (an int). However, this isn't working. It compiles but doens't give the right value. Why is that? XMLDataNode is a class that manipulates xmllite.dll.
Time for some wild guesses!
I'll bet you that the text you get from *pointsNode->GetInnerText() isn't what you think it is. Have you checked that it is indeed exactly the text you want? In particular, could it contain whitespace? Parsing a nicely formatted (i.e. indented, broken into lines, etc) XML file without a schema to reference ends up meaning that all sorts text nodes involving whitespace will end up in your DOM tree.