How to parse an XML file with RapidXml - c++

I have to parse an XML file in C++. I was researching and found the RapidXml library for this.
I have doubts about doc.parse<0>(xml).
Can xml be an .xml file or does it need to be a string or char *?
If I can only use string or char * then I guess I need to read the whole file and store it in a char array and pass the pointer of it to the function?
Is there a way to directly use a file because I would need to change the XML file inside the code also.
If that is not possible in RapidXml then please suggest some other XML libraries in C++.
Thanks!!!
Ashd

RapidXml comes with a class to do this for you, rapidxml::file in the rapidxml_utils.hpp file.
Something like:
#include "rapidxml_utils.hpp"
int main() {
rapidxml::file<> xmlFile("somefile.xml"); // Default template is char
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
...
}
Note that the xmlFile object now contains all of the data for the XML, which means that once it goes out of scope and is destroyed the doc variable is no longer safely usable. If you call parse inside of a function, you must somehow retain the xmlFile object in memory (global variable, new, etc) so that the doc remains valid.

New to C++ myself... but I wanted to share a solution.
YMMV!
Shout Out to SiCrane on this thread:
-- and just replacing 'string' with a vector --- (thanks anno)
Please comment and help me learn also! I'm very new to this
Anyway, this seems to work for a good start:
#include <iostream>
#include <fstream>
#include <vector>
#include "../../rapidxml/rapidxml.hpp"
using namespace std;
int main(){
ifstream myfile("sampleconfig.xml");
rapidxml::xml_document<> doc;
/* "Read file into vector<char>" See linked thread above*/
vector<char> buffer((istreambuf_iterator<char>(myfile)), istreambuf_iterator<char>( ));
buffer.push_back('\0');
cout<<&buffer[0]<<endl; /*test the buffer */
doc.parse<0>(&buffer[0]);
cout << "Name of my first node is: " << doc.first_node()->name() << "\n"; /*test the xml_document */
}

We usually read the XML from the disk into a std::string, then make a safe copy of it into a std::vector<char> as demonstrated below:
string input_xml;
string line;
ifstream in("demo.xml");
// read file into input_xml
while(getline(in,line))
input_xml += line;
// make a safe-to-modify copy of input_xml
// (you should never modify the contents of an std::string directly)
vector<char> xml_copy(input_xml.begin(), input_xml.end());
xml_copy.push_back('\0');
// only use xml_copy from here on!
xml_document<> doc;
// we are choosing to parse the XML declaration
// parse_no_data_nodes prevents RapidXML from using the somewhat surprising
// behavior of having both values and data nodes, and having data nodes take
// precedence over values when printing
// >>> note that this will skip parsing of CDATA nodes <<<
doc.parse<parse_declaration_node | parse_no_data_nodes>(&xml_copy[0]);
For a complete source code check:
Read a line from xml file using C++

The manual tells us:
function xml_document::parse
[...] Parses zero-terminated XML string
according to given flags.
RapidXML leaves loading the character data from a file to you. Either read the file into a buffer, like anno suggested or alternatively use some memory mapping technique. (But look up parse_non_destructive flag first.)

Related

How to handle file I/O outside of main when redirecting input and output?

A restriction of the program I am working on is that it should be invoked as: ./a.out < input.txt > output.txt. The input of this program should be read from the first file, and the output should be written to the second.
So, this redirects standard input and output from and to these two files. I could simply, from main() for example, call std::cin and std::cout. However, I have a dedicated component which adapts my input from a file to an intermediate structure that I use elsewhere in my program.
In order to build this struct I could #include <iostream> in this component and read with std::cin from input.txt. However, I don't like the idea of including iostream here, and I am not sure how I would test this.
My issue comes from the I/O redirect, if the executable were invoked with filenames as strings, I would do something along the lines of
InputAdapter inputAdapter;
ifstream infile;
infile.open(filename ,std::ios_base::in);
auto structHoldingParsedInput = inputAdapter.adapt(infile);
How can I achieve something similar here?
I would suggest you make your adapter parameters std::istream& and std::ostream& so you can pass in either the standard std::cin/std::cout or files you open yourself like std::ifstream.
A bit like this:
class InputAdapter
{
public:
void adapt(std::istream& in)
{
// code to convert input to output here
return created_object;
}
};
// ...
InputAdapter inputAdapter;
std::ifstream in("input_file");
auto structHoldingParsedInput = inputAdapter.adapt(in);
Now you are coding to streams rather than files you can use any stream, for example the standard input stream:
auto structHoldingParsedInput = inputAdapter.adapt(std::cin);
And, for testing you could use std::istringstream:
std::istringstream test_stream(R"(
put your test data in here
)");
auto structHoldingParsedInput = inputAdapter.adapt(test_stream);

Extract JSON data from file in C++

Here there, sorry if this question is not well-suited for this forum. I'm pretty new to programming and thought I'd get a better command of strings and files by creating this little project. What I'm trying to do is extract data from a JSON document. Eventually I'd store the data in an array I suppose and work with it later.
Basically, I'm wondering if there is a better way of going about this. The code seems kind of wordy and definitely not elegant. Again, sorry if this question is not a good one, but I figured there'd be no better way to learn than through a community like this.
#include <iostream>
#include <fstream>
#include <cstring>
#include <string> //probably including more than necessary
using namespace std; //should be specifying items using scope resolution operator instead
int main(int argc, const char * argv[])
{
ifstream sfile("JSONdatatest.txt");
string line,temp;
while(!sfile.eof()){
getline(sfile, line);
temp.append(line); //creates string from file text, use of temp seems extraneous
}
sfile.close();
cout << "Reading from the file.\n";
size_t counter=0;
size_t found=0;
size_t datasize=0;
while(found!=string::npos && found<1000*70){ //problem here, program was creating infinite loop
//initial 'solution' was to constrain found var
//but fixed with if statement
found = temp.find("name: ",counter);
if(found!=string::npos){
found=found+7; //length of find variable "name: ", puts us to the point where data begins
size_t ended=temp.find_first_of( "\"", found);
size_t len=ended-found; //length of datum to extract
string temp2(temp, found, len); //odd use of a second temp function,
cout << temp2 << endl;
counter=ended+1;
datasize++; //also problem with data size and counter, so many counters, can they
//coordinate to have fewer?
}
}
cout << datasize;
return 0}
Where I indicate an infinite loop is made, I fixed by adding the if statement in the while loop. My guess is because I add 7 to 'found' there is a chance it skips over npos and the loop continues. Adding the if statement fixed it, but made the code look clunky. There has to be a more elegant solution.
Thanks in advance!
I would recommend that you use a third-party to do all this stuff, which is pretty tough with raw tools. I actually did this kind of stuff recently so I can give you some help.
I would recommend you take a look at boost::property_tree .
Here is the theory: A Json file is like a tree, you have a root, and many branches.
The idea is to transform this JSON file into a boost::property_tree::ptree, so then you use easily the object ptree and not the file.
First, let's say we have this JSON file:
{
"document": {
"person": {
"name": "JOHN",
"age": 21
},
"code": "AX-GFD123"
}
"body" : "none"
}
Then in your code, be sure to include:
#include "boost/property_tree/ptree.hpp"
#include "boost/property_tree/json_parser.hpp"
Then here is the most interesting part:
boost::property_tree::ptree root;
You create the ptree object named root.
boost::property_tree::read_json("/path_to_my_file/doc.json", root);
Then you tell what file to read, and where to store it (here in root). Be careful, you should use try / catch on this in case the file doesn't exist.
Then you will only use the root tree which is really easy to do. You have many functions (I invite you to see the boost documentation page).
You want to access the namefield. Right then do this:
std::string myname = root.get<std::string> ("document.person.name", "NOT FOUND");
The get function has the first parameter the path to get the attribute you want, the second is for default return if the path is incorrect or doesn't exist. the <std::string> is to show what type it must return.
Let's finish with another example. Let's say you want to check all your root nodes, that means every node which are on the top level.
BOOST_FOREACH(const boost::property_tree::ptree::value_type& child, root.get_child(""))
{ cout << child.first << endl; }
This is a bit more complicated. I explain. You tell boost to look every child of the root with root.get_child("") , "" is used for root. Then, for every child found, (like a basic iterator), you will use const boost::property_tree::ptree::value_type& child.
So inside the foreach, you will use the child to access whatever you want. child.firstwill give you the name of the child node currently in use. In my example it will print first document, and then body.
I invite you to have a look at Boost documentation. It looks maybe hard at first, but it is really easy to use after that.
http://www.boost.org/doc/libs/1_41_0/doc/html/property_tree.html

Solving an exercise from Thinking C++

The exercise says:
Create a Text class that contains a string object to hold the text of
a file. Give it two constructors: a default constructor and a
constructor that takes a string argument that is the name of the file
to open. When the second constructor is used, open the file and read
the contents into the string member object. Add a member function
contents() to return the string so (for example) it can be printed. In
main( ), open a file using Text and print the contents.
This is the class that I wrote:
class Text {
string fcontent;
public:
Text();
Text(string fname);
~Text();
string contents();
};
I haven't understood everything of this exercise. It asks to create a function contents(), that returns a string, but it doesn't says what the function has to do...
Neither what the default constructor has to do.
Could someone help me?
The function has to return the contents of the file, which is stored (in your case) in fcontents.
string Text::contents()
{
return fcontent;
}
The default constructor doesn't have to do anything in this case.
Text::Text(){}
EDIT:
Seeing how many comments there are below with new problems, I'm going to recap and answer the rest of the questions here.
in Text.h you have:
#ifndef TEXT_HH
#define TEXT_HH
#include <string> //[1]
class Text {
std::string fcontent;//[2]
public:
Text();
Text(std::string fname);
~Text();
std::string contents();
};
#endif
and Text.cpp has
// Text.cpp
#include "Text.h"
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
Text::Text() {}
Text::Text(string fname) {
fstream f;
f.open(fname.c_str(), ios::in);//[3]
//[4]
std::stringstream stream;
while(true)
{
char buffer[1000];
f.getline(buffer, 1000);
if(f.good())
{
//This actually adds an extra newline at the end
stream << buffer << '\n';
}
else
{
break;
}
}
fcontent = stream.str();
//remove extra newline
fcontent.erase(fcontent.begin() + fcontent.size() - 1);
f.close();//This is technically unnecessary, but not bad either
}
string Text::contents() {
return fcontent;
}
Text::~Text() {}//[5]
Point 1: The header file <string> contains the class definition for std::string, the C++ string. This should not be confused with <cstring> which contains functions for manipulating C strings (const char *, const char[], etc).
Point 2: The string class exists in the ::std namespace, which means we have to either use std::string every time we want that class or use using namespace std; to pull this class into the global scope. In the header file we prefer the former method because the using declaration doesn't go away, which means that the namespace will be changed for every header and source file that includes this one, which we want to avoid in general (ie. always). In the cpp file however, there is no problem using the using declaration and we do so.
Point 3: fstreams take a C string as the filename parameter, we can get the corresponding C string from a C++ string with the call c_str(). This returns a const char *.
Point 4: To read the whole text file into a string is less obvious than it seems because the way streams deal with eof (end-of-file) and state-checking stuff. In short it will read one more time than you want it to (I know, wanting is subjective, but is close enough I think) before setting the eof flag. That's why the state is checked after calling get and before adding what's been read to our stringstream. Streams are a fairly elaborate topic so I won't go into it in more detail here.
Point 5: Destructors on objects (non-pointers, like our fcontents is) are called automatically, so we don't need to do anything to make sure that our fcontents string is destroyed when our Text object is destroyed. When we allocate something dynamically with new that's when we have to worry about calling delete on it when we want to destroy it.

How do I get a string or stream into a CStreamFile?

this question may seem a bit too specific, but I figure I'd give it a shot here since I've found some great programming answers here in the past.
I am modifying the open-source program TinyCad for a project I'm working on. I've visited the TinyCad message board and posted, but I didn't get the answer I'm looking for. I'm having trouble wrapping my head about how to integrate a small XML converter class I wrote into the loading function of TinyCad.
A little background about me: I have no experience with MFC or Visual Studio, but that is what I have to use. I am used to C++ and was taught using iostream syntax (cout, cin, new, etc.) so I'm not used to older C code (like printf, sprintf, malloc, alloc, etc.) either. I usually write my programs from start to finish in Qt, but I was told that for this project I should modify an existing program to save time. I don't know if it'll save that much time if I have to learn something totally foreign, but I digress.
I wrote a small class to read in an XML file that is structured differently than the XML file that TinyCad reads in. My class converts it and outputs an intermediate XML file. Well, I don't want to spit out an intermediate file. I modified it to save the output as a string (using the string datatype from the standard C++ iostream library). I want to get this string into a stream so that TinyCad can open the file, do the conversion, and then continue loading.
My class is called like so:
std::string blah;
char* filename = "library.xml";
XMLopen myXML(filename, blah);
So it takes in a filename, opens the file, parses the relevant information out of the file, puts the information into TinyCad's XML structure, and saves the XML code as a string that has been passed by reference.
I had an idea to use istringstream to make a stream, but that did not play nice with CFile. I tried it like so:
istringstream ins; // Declare an input string stream.
ins.str(blah);
// First open the stream to save to
CFile theFile(ins);
Below is the code in TinyCad that opens and loads the selected XML file:
void CLibraryStore::LoadXML( const TCHAR *filename )
{
// First open the stream to save to
CFile theFile;
// Open the file for saving as a CFile for a CArchive
BOOL r = theFile.Open(filename, CFile::modeRead);
if (r)
{
CString name;
// Create the XML stream writer
CStreamFile stream( &theFile, CArchive::load );
CXMLReader xml( &stream );
// Get the library tag
xml.nextTag( name );
if (name != "Library")
{
Message(IDS_ABORTVERSION,MB_ICONEXCLAMATION);
return;
}
xml.intoTag();
CTinyCadApp::SetLockOutSymbolRedraw( true );
while ( xml.nextTag( name ) )
{
// Is this a symbol?
if (name == "SYMBOL")
{
// Load in the details
xml.intoTag();
CTinyCadMultiSymbolDoc temp_doc;
drawingCollection drawing;
CLibraryStoreNameSet s;
// this is where the stream gets sent to be loaded into the data structure
s.LoadXML( &temp_doc, xml );
xml.outofTag();
// ... and store the symbol
Store( &s, temp_doc );
}
}
xml.outofTag();
CTinyCadApp::SetLockOutSymbolRedraw( false );
}
}
Edit 7/28/2010 5:55PM
So I tried to make a stream, but it fails.
CStreamFile takes in a filename and then gets set as a CArchive:
m_pArchive = new CArchive( theFile, nmode );
I tried to make a CStream like so (since CStreamFile is an overloaded CStream):
CString test = blah.c_str();
CStreamMemory streamCS;
streamCS << test;
CXMLReader xml( &streamCS );
But at streamCS << test; it doesn't put the stream in at all. test gets assigned correctly with blah so I know that's working.
Any ideas on how to approach this?

Why doesn't boost::serialization check for tag names in XML archives?

I'm starting to use boost::serialization on XML archives. I can produce and read data, but when I hand-modify the XML and interchange two tags, it "fails to fail" (i.e. it proceeds happily).
Here's a small, self-complete example showing what I see:
#include <iostream>
#include <fstream>
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/serialization/nvp.hpp>
#include <boost/serialization/split_member.hpp>
using namespace std;
int main (void)
{
boost::archive::xml_oarchive oa (cout);
static const string producer = "XXX", version = "0.0.1";
oa << boost::serialization::make_nvp ("producer", producer);
oa << boost::serialization::make_nvp ("producer_version", version);
}
This writes XML to standard output, which contains:
<producer>XXX</producer>
<producer_version>0.0.1</producer_version>
Now, I replace all the code in the main function with a reader:
boost::archive::xml_iarchive ia (cin);
string producer, version;
ia >> boost::serialization::make_nvp ("producer", producer);
ia >> boost::serialization::make_nvp ("producer_version", version);
cout << producer << " " << version << endl;
which works as expected when fed the previous output (outputs "XXX 0.0.1"). If, however, I feed it XML in which I changed the order of the two lines "producer" and "producer_version", it still runs and outputs "0.0.1 XXX".
Thus, it fails to recognize that the tags don't have the expected names, and just proceed. I would have expected it to thrown a xml_archive_parsing_error exception, as indicated in the doc.
Does someone here have experience with that? What I am doing wrong?
Just changing the order of the two lines won't cause an xml_archive_parsing_error exception. The doc you've linked says that itself:
(...)This might be possible if only the data is changed and not the XML attributes and nesting structure is left unaltered.(...)
You haven't changed attributes and the order change has kept the structure (still two fields on the first level of your XML). No exception will ever be thrown this way.
Also, when reading a XML using the make_nvp function, the name parameter won't put any restriction on what is being read. It will just tell arbitrarily the name to be used with the new name-value pair.
So, you can change the name of your XML tags on your input, as long you don't change your expected order i.e. you could rename producer and producer_version on your XML to foo and bar and still would read the serialized data correctly i.e.:
<foo>XXX</foo>
<bar>0.0.1</bar>
And your printed answer would still be "XXX 0.0.1".
Since this only formatting your serialized data as a XML, there is no interest in checking the tag names. They are only used for making your serialized output more readable.