Find a line (and column) of xml_node in rapidxml - c++

From what I could understand in the docs I deducted every xml_node knows it's position in the source text. What I'd like to do is to retrieve LINE and COLUMN for given xml_node<>*:
rapidxml::file<> xmlFile("generators.xml"); // Open file, default template is char
xml_document<> doc; // character type defaults to char
doc.parse<0>(xmlFile.data());; // 0 means default parse flags
xml_node<> *main = doc.first_node(); //Get the main node that contains everything
cout << "My first node is: <" << main->name() << ">\n";
cout << " located at line " << main->?????() << ", column " << main->?????() << "\n";
How should I retrieve those offsets? Could I somehow crawl from the main->name() pointer back to the beginning of the document? But how can I access the document string from xml_document<> doc to compare offsets?

Let's say you parse a simple xml document in a string.
char xml[] = "<hello/><world/>"
doc.parse(xml);
RapidXML will insert null terminators (and maybe make other mods to the "document", so it might look like this now:
char xml[] = "<hello\000\000<world\000\000";
If you than ask for the name() of the 'hello' node, it returns a pointer to the 'h' in your xml array. You can just subtract the base of the array to get an offset.
int offset = node->name() - &xml[0];
Obviously this isn't line and character. To get that, you'd need to count the number of newlines between the offset and the array start. (but maybe do this on a 'clean' version of the xml data, as RapidXML might well mangle newline sequences in the processed version..

Related

std::ostringstream not initialized with std::string [duplicate]

I'm making an OpenGL game in C++. I'm fairly inexperinced in C++ as opposed to other languages. Anyway, I create a stringstream with the "base" directory for some images. I then pass this stringstream as a function parameter to a constructor. The constructor appends an image file name, then attempts to load the resulting path. However...
D:\CodeBlocks Projects\SnakeRoid\bin\Debug\Texts\ <-- before appending the filename
Ship01.tgacks Projects\SnakeRoid\bin\Debug\Texts\ <-- After.
Obviously not correct! The result should be D:\CodeBlocks Projects\SnakeRoid\bin\Debug\Texts\Ship01.tga
The relevant parts of my code:
std::stringstream concat;
std::string txtFullPath = "Path here";
...
concat.str(""); //Reset value (because it was changed in ...)
concat << texFullPath; //Restore the base path
PS = new PlayerShip(&TexMan, concat); //Call the constructor
The constructor's code
PlayerShip::PlayerShip(TextureManager * TexMan, std::stringstream &path)
{
texId = 2;
std::cout << path.str(); //First path above
path << "Ship01.tga";
std::cout << path.str(); //Second - this is the messed up one
//Do more fun stuff
}
Anyone have any idea why its "overwriting" what's already in the stringstream?
why its "overwriting" what's already in the stringstream
Because output places characters at the "put pointer" position in the output buffer. A freshly-constructed stream has the put pointer set to zero (except for file output streams opened in append mode), thus your output overwrites the characters already in the buffer.
If you really need to append strings this way, you need to move the put pointer to the end of the buffer:
std::cout << p.str(); //First path above
std::stringstream path;
path.str(p.str());
path.seekp(0, std::ios_base::end); // <-- add this
path << "Ship01.tga";
std::cout << "Loading player ship from " << path.str();
EDIT: The question has been edited and the code after the edit works, because it no longer uses path.str(p.str()); to create the output buffer without using an output operation (and without advancing the put pointer): see ideone for differences.
In any case, strings themselves can be concatenated, which would make the code easier to follow:
std::string p = path.str() + "Ship01.tga";
std::cout << p;
Not to mention that for dealing with files and pathnames, we have boost.filesystem.

Parsing Data of data from a file

i have this project due however i am unsure of how to parse the data by the word, part of speech and its definition... I know that i should make use of the tab spacing to read it but i have no idea how to implement it. here is an example of the file
Recollection n. The power of recalling ideas to the mind, or the period within which things can be recollected; remembrance; memory; as, an event within my recollection.
Nip n. A pinch with the nails or teeth.
Wodegeld n. A geld, or payment, for wood.
Xiphoid a. Of or pertaining to the xiphoid process; xiphoidian.
NB: Each word and part of speech and definition is one line in a text file.
If you can be sure that the definition will always follow the first period on a line, you could use an implementation like this. But it will break if there are ever more than 2 periods on a single line.
string str = "";
vector<pair<string,string>> v; // <word,definition>
while(getline(fileStream, str, '.')) { // grab line, deliminated '.'
str[str.length() - 1] = ""; // get rid of n, v, etc. from word
v.push_back(make_pair<string,string>(str,"")); // push the word
getline(fileStream, str, '.'); // grab the next part of the line
v.back()->second = str; // push definition into last added element
}
for(auto x : v) { // check your results
cout << "word -> " << x->first << endl;
cout << "definition -> " << x->second << endl << endl;
}
The better solution would be to learn Regular Expressions. It's a complicated topic but absolutely necessary if you want to learn how to parse text efficiently and properly:
http://www.cplusplus.com/reference/regex/

How to assign string a char array that starts from the middle of the array?

For example in the following code:
char name[20] = "James Johnson";
And I want to assign all the character starting after the white space to the end of the char array, so basically the string is like the following: (not initialize it but just show the idea)
string s = "Johnson";
Therefore, essentially, the string will only accept the last name. How can I do this?
i think you want like this..
string s="";
for(int i=strlen(name)-1;i>=0;i--)
{
if(name[i]==' ')break;
else s+=name[i];
}
reverse(s.begin(),s.end());
Need to
include<algorithm>
There's always more than one way to do it - it depends on exactly what you're asking.
You could either:
search for the position of the first space, and then point a char* at one-past-that position (look up strchr in <cstring>)
split the string into a list of sub-strings, where your split character is a space (look up strtok or boost split)
std::string has a whole arsenal of functions for string manipulation, and I recommend you use those.
You can find the first whitespace character using std::string::find_first_of, and split the string from there:
char name[20] = "James Johnson";
// Convert whole name to string
std::string wholeName(name);
// Create a new string from the whole name starting from one character past the first whitespace
std::string lastName(wholeName, wholeName.find_first_of(' ') + 1);
std::cout << lastName << std::endl;
If you're worried about multiple names, you can also use std::string::find_last_of
If you're worried about the names not being separated by a space, you could use std::string::find_first_not_of and search for letters of the alphabet. The example given in the link is:
std::string str ("look for non-alphabetic characters...");
std::size_t found = str.find_first_not_of("abcdefghijklmnopqrstuvwxyz ");
if (found!=std::string::npos)
{
std::cout << "The first non-alphabetic character is " << str[found];
std::cout << " at position " << found << '\n';
}

C++ - stringstream << "overwriting"

I'm making an OpenGL game in C++. I'm fairly inexperinced in C++ as opposed to other languages. Anyway, I create a stringstream with the "base" directory for some images. I then pass this stringstream as a function parameter to a constructor. The constructor appends an image file name, then attempts to load the resulting path. However...
D:\CodeBlocks Projects\SnakeRoid\bin\Debug\Texts\ <-- before appending the filename
Ship01.tgacks Projects\SnakeRoid\bin\Debug\Texts\ <-- After.
Obviously not correct! The result should be D:\CodeBlocks Projects\SnakeRoid\bin\Debug\Texts\Ship01.tga
The relevant parts of my code:
std::stringstream concat;
std::string txtFullPath = "Path here";
...
concat.str(""); //Reset value (because it was changed in ...)
concat << texFullPath; //Restore the base path
PS = new PlayerShip(&TexMan, concat); //Call the constructor
The constructor's code
PlayerShip::PlayerShip(TextureManager * TexMan, std::stringstream &path)
{
texId = 2;
std::cout << path.str(); //First path above
path << "Ship01.tga";
std::cout << path.str(); //Second - this is the messed up one
//Do more fun stuff
}
Anyone have any idea why its "overwriting" what's already in the stringstream?
why its "overwriting" what's already in the stringstream
Because output places characters at the "put pointer" position in the output buffer. A freshly-constructed stream has the put pointer set to zero (except for file output streams opened in append mode), thus your output overwrites the characters already in the buffer.
If you really need to append strings this way, you need to move the put pointer to the end of the buffer:
std::cout << p.str(); //First path above
std::stringstream path;
path.str(p.str());
path.seekp(0, std::ios_base::end); // <-- add this
path << "Ship01.tga";
std::cout << "Loading player ship from " << path.str();
EDIT: The question has been edited and the code after the edit works, because it no longer uses path.str(p.str()); to create the output buffer without using an output operation (and without advancing the put pointer): see ideone for differences.
In any case, strings themselves can be concatenated, which would make the code easier to follow:
std::string p = path.str() + "Ship01.tga";
std::cout << p;
Not to mention that for dealing with files and pathnames, we have boost.filesystem.

C++: How to extract a string from RapidXml

In my C++ program I want to parse a small piece of XML, insert some nodes, then extract the new XML (preferably as a std::string).
RapidXml has been recommended to me, but I can't see how to retrieve the XML back as a text string.
(I could iterate over the nodes and attributes and build it myself, but surely there's a build in function that I am missing.)
Thank you.
Althoug the documentation is poor on this topic, I managed to get some working code by looking at the source. Although it is missing the xml header which normally contains important information. Here is a small example program that does what you are looking for using rapidxml:
#include <iostream>
#include <sstream>
#include "rapidxml/rapidxml.hpp"
#include "rapidxml/rapidxml_print.hpp"
int main(int argc, char* argv[]) {
char xml[] = "<?xml version=\"1.0\" encoding=\"latin-1\"?>"
"<book>"
"</book>";
//Parse the original document
rapidxml::xml_document<> doc;
doc.parse<0>(xml);
std::cout << "Name of my first node is: " << doc.first_node()->name() << "\n";
//Insert something
rapidxml::xml_node<> *node = doc.allocate_node(rapidxml::node_element, "author", "John Doe");
doc.first_node()->append_node(node);
std::stringstream ss;
ss <<*doc.first_node();
std::string result_xml = ss.str();
std::cout <<result_xml<<std::endl;
return 0;
}
Use print function (found in rapidxml_print.hpp utility header) to print the XML node contents to a stringstream.
rapidxml::print reuqires an output iterator to generate the output, so a character string works with it. But this is risky because I can not know whether an array with fixed length (like 2048 bytes) is long enough to hold all the content of the XML.
The right way to do this is to pass in an output iterator of a string stream so allow the buffer to be expanded when the XML is being dumped into it.
My code is like below:
std::stringstream stream;
std::ostream_iterator<char> iter(stream);
rapidxml::print(iter, doc, rapidxml::print_no_indenting);
printf("%s\n", stream.str().c_str());
printf("len = %d\n", stream.str().size());
If you do build XML yourself, don't forget to escape the special characters. This tends to be overlooked, but can cause some serious headaches if it is not implemented:
< <
> >
& &
" "
&apos; &apos;
Here's how to print a node to a string straight from the RapidXML Manual:
xml_document<> doc; // character type defaults to char
// ... some code to fill the document
// Print to stream using operator <<
std::cout << doc;
// Print to stream using print function, specifying printing flags
print(std::cout, doc, 0); // 0 means default printing flags
// Print to string using output iterator
std::string s;
print(std::back_inserter(s), doc, 0);
// Print to memory buffer using output iterator
char buffer[4096]; // You are responsible for making the buffer large enough!
char *end = print(buffer, doc, 0); // end contains pointer to character after last printed character
*end = 0; // Add string terminator after XML
If you aren't yet committed to Rapid XML, I can recommend some alternative libraries:
Xerces - This is probably the defacto C++ implementation.
XMLite - I've had some luck with this minimal XML implementation. See the article at http://www.codeproject.com/KB/recipes/xmlite.aspx
Use static_cast<>
Ex:
rapidxml::xml_document<> doc;
rapidxml::xml_node <> * root_node = doc.first_node();
std::string strBuff;
doc.parse<0>(xml);
.
.
.
strBuff = static_cast<std::string>(root_node->first_attribute("attribute_name")->value());
Following is very easy,
std::string s;
print(back_inserter(s), doc, 0);
cout << s;
You only need to include "rapidxml_print.hpp" header in your source code.