C++: How to extract a string from RapidXml - c++

In my C++ program I want to parse a small piece of XML, insert some nodes, then extract the new XML (preferably as a std::string).
RapidXml has been recommended to me, but I can't see how to retrieve the XML back as a text string.
(I could iterate over the nodes and attributes and build it myself, but surely there's a build in function that I am missing.)
Thank you.

Althoug the documentation is poor on this topic, I managed to get some working code by looking at the source. Although it is missing the xml header which normally contains important information. Here is a small example program that does what you are looking for using rapidxml:
#include <iostream>
#include <sstream>
#include "rapidxml/rapidxml.hpp"
#include "rapidxml/rapidxml_print.hpp"
int main(int argc, char* argv[]) {
char xml[] = "<?xml version=\"1.0\" encoding=\"latin-1\"?>"
"<book>"
"</book>";
//Parse the original document
rapidxml::xml_document<> doc;
doc.parse<0>(xml);
std::cout << "Name of my first node is: " << doc.first_node()->name() << "\n";
//Insert something
rapidxml::xml_node<> *node = doc.allocate_node(rapidxml::node_element, "author", "John Doe");
doc.first_node()->append_node(node);
std::stringstream ss;
ss <<*doc.first_node();
std::string result_xml = ss.str();
std::cout <<result_xml<<std::endl;
return 0;
}

Use print function (found in rapidxml_print.hpp utility header) to print the XML node contents to a stringstream.

rapidxml::print reuqires an output iterator to generate the output, so a character string works with it. But this is risky because I can not know whether an array with fixed length (like 2048 bytes) is long enough to hold all the content of the XML.
The right way to do this is to pass in an output iterator of a string stream so allow the buffer to be expanded when the XML is being dumped into it.
My code is like below:
std::stringstream stream;
std::ostream_iterator<char> iter(stream);
rapidxml::print(iter, doc, rapidxml::print_no_indenting);
printf("%s\n", stream.str().c_str());
printf("len = %d\n", stream.str().size());

If you do build XML yourself, don't forget to escape the special characters. This tends to be overlooked, but can cause some serious headaches if it is not implemented:
< <
> >
& &
" "
&apos; &apos;

Here's how to print a node to a string straight from the RapidXML Manual:
xml_document<> doc; // character type defaults to char
// ... some code to fill the document
// Print to stream using operator <<
std::cout << doc;
// Print to stream using print function, specifying printing flags
print(std::cout, doc, 0); // 0 means default printing flags
// Print to string using output iterator
std::string s;
print(std::back_inserter(s), doc, 0);
// Print to memory buffer using output iterator
char buffer[4096]; // You are responsible for making the buffer large enough!
char *end = print(buffer, doc, 0); // end contains pointer to character after last printed character
*end = 0; // Add string terminator after XML

If you aren't yet committed to Rapid XML, I can recommend some alternative libraries:
Xerces - This is probably the defacto C++ implementation.
XMLite - I've had some luck with this minimal XML implementation. See the article at http://www.codeproject.com/KB/recipes/xmlite.aspx

Use static_cast<>
Ex:
rapidxml::xml_document<> doc;
rapidxml::xml_node <> * root_node = doc.first_node();
std::string strBuff;
doc.parse<0>(xml);
.
.
.
strBuff = static_cast<std::string>(root_node->first_attribute("attribute_name")->value());

Following is very easy,
std::string s;
print(back_inserter(s), doc, 0);
cout << s;
You only need to include "rapidxml_print.hpp" header in your source code.

Related

How to print the content of a string as "sting literal source"

Suppose s is
a
b
c
const std::string s =
std::cout << R"( s )" << std::endl;
How to std::cout the content of the string in raw literal? I mean the cout return the value in this format: "a\nb\nc".
I need to transform a very large text into a std::string.
I cant use fileread as i need to define its value inside the src.
What you would need to do is to scan the string, and replace all occurrences of the characters you are interested in (such as carriage return, tab, etc) with printable escape sequence and than print this new text.
Here is somewhat crude proof of concept:
std::string escape(std::string_view src) {
std::string ret;
ret.reserve(src.size() * 2); // at worst, the string consists solely of escapable symbols
static constexpr std::array escapable = {std::make_pair('\t', 't'),
std::make_pair('\n', 'n')}; // add more chars as needed, note that the array is sorted
for (const char ch: src) {
std::pair search_pair{ch, ' '};
auto esc_char = std::equal_range(escapable.begin(), escapable.end(), search_pair, [](auto& a, auto& b) { return a.first < b.first; });
if (esc_char.first != escapable.end()) {
ret.push_back('\\');
ret.push_back(esc_char.first->second);
} else {
ret.push_back(ch);
}
}
return ret;
}
Now, you can use it:
const std::string str = "A\nbub\tfuf\n";
std::cout << escape(str) << "\n";
Above snippet prints A\nbub\tfuf\n
You could be interested by the JSON specification.
You could consider serializing your data in JSON format using open source C++ libraries like jsoncpp
You could also consider using some YAML format with the yaml-cpp library
You could be interested by the SWIG tool which generates C++ glue code.
You could consider using binary data formats like XDR.
You should specify (on paper, with a pencil) your data format in EBNF notation and use ANTLR or GNU bison to generate the parser (the printer is easier to code)
The RefPerSys project (an open source symbolic artificial intelligence system, GPLv3+ licensed) is persisting data in textual format. You may borrow some code are re-use it in your application, if you obey to that GPL license.
Look also into Qt or POCO frameworks, but notice that DWORD64 is not a standard C++ type. See this C++ reference and read a recent C++ standard (like n3337 or better).
Consider generating your C++ serializing code
With tools like GNU m4 or GPP (or your own one).
Pitrat's book Artificial Beings: the Conscience of a Conscious Machine (ISBN-13: 978-1848211018) should give you valuable insight and intuitions.
You can load this text file into a std::string like this:
Store the text in your file, e.g. mystring.txt, as a raw string literal in the format R"(raw_characters)":
R"(Run.M128A XmmRegisters[16];
BYTE Reserved4[96];", Run.CONTEXT64 := " DWORD64 P1Home;
DWORD64 P2Home;
...
)"
#include the file into a string:
namespace
{
const std::string mystring =
#include "mystring.txt"
;
}
Your IDE might flag this up as a syntax error, but it isn't. What you're doing is loading the contents of file directly into the string at compile time.
Finally print the string:
std::cout << mystring << std::endl;
Why not just save the escaped version of the string in the file?
Any way, here's a function to 'escape' characters:
#include <iostream>
#include <string>
#include <unordered_map>
std::string replace_all(const std::string &mystring)
{
const std::unordered_map<char, std::string> lookup =
{ {'\n', "\\n"}, {'\t', "\\t"}, {'"', "\\\""} };
std::string new_string;
new_string.reserve(mystring.length() * 2);
for (auto c : mystring)
{
auto it = lookup.find(c);
if (it != lookup.end())
new_string += it->second;
else
new_string += c;
}
return new_string;
}
int main() {
std::string mystring = R"(Run.M128A XmmRegisters[16];
BYTE Reserved4[96];", Run.CONTEXT64 := " DWORD64 P1Home;
DWORD64 P2Home;
DWORD64 P3Home;
DWORD64 P4Home;
DWORD64 P5Home;
DWORD64 P6Home;)";
auto new_string = replace_all(mystring);
std::cout << new_string << std::endl;
return 0;
}
Here's a demo.

Decoding / Encloding Text File using Stack Library - Can't Encode Large Files C++

I am working on a program that can encode and then decode text in C++. I am using the stack library. The way the program works is that it first asks you for a cypher key, which you put in manually. It then asks for the file name, which is a text file. If it is a normal txt file, it encodes the message to a new file and adds a .iia files extension. If the text file already has a .iia file extension, then it decodes the message, as long as the cypher key is the same as the one used to encode it.
My program does encode and decode, but how many characters it decodes is determined by temp.size() % cypher.length() that is in the while loop in the readFileEncode() function. I think this is what is keeping the entire file from being encoded and then decoded correctly. Another words, the ending file after it has been decoded from say "example.txt.iia" back to "example.txt" is missing a large portion of the text from the original "example.txt" file. I tried just cypher.length() but of course that does not encode or decode anything then. The entire process is determined by that argument for the decoding and encoding.
I cannot seem to find out the exact logic for this to encode and decode all the characters in any size file. Here is the following code for the function that does the decoding and encoding:
EDIT: Using WhozCraig's code that he edited for me:
void readFileEncode(string fileName, stack<char> &text, string cypher)
{
std::ifstream file(fileName, std::ios::in|std::ios::binary);
stack<char> temp;
char ch;
while (file.get(ch))
temp.push(ch ^ cypher[temp.size() % cypher.length()]);
while (!temp.empty())
{
text.push(temp.top());
temp.pop();
}
}
EDIT: A stack is required. I am going to implement my own stack class, but I am trying to get this to work first with the stack library. Also, if there is a better way of implementing this, please let me know. Otherwise, I believe that there is not much wrong with this except to get it to go through the loop to encode and decode the entire file. I am just unsure as to why it stops at, say 20 characters sometimes, or ten characters. I know it has to do with how long the cypher is too, so I believe it is in the % (mod). Just not sure how to rewrite.
EDIT: Ok, tried WhozCraig's solution and I don't get the desired output, so the error now must be in my main. Here is my code for the main:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <cstdlib>
#include <cctype>
#include <stack>
using namespace std;
void readFileEncode(string fileName, stack<char> &text, string cypher);
int main()
{
stack<char> text; // allows me to use stack from standard library
string cypher;
string inputFileName;
string outputFileName;
int position;
cout << "Enter a cypher code" << endl;
cin >> cypher;
cout << "Enter the name of the input file" << endl;
cin >> inputFileName;
position = inputFileName.find(".iia");//checks to see if the input file has the iia extension
if (position > 1){
outputFileName = inputFileName;
outputFileName.erase(position, position + 3);// if input file has the .iia extension it is erased
}
else
//outputFileName.erase(position, position + 3);// remove the .txt extension and
outputFileName = inputFileName + ".iia";// add the .iia extension to file if it does not have it
cout << "Here is the new name of the inputfile " << outputFileName << endl; // shows you that it did actually put the .iia on or erase it depending on the situation
system("pause");
readFileEncode(inputFileName, text, cypher); //calls function
std::ofstream file(outputFileName); // calling function
while (text.size()){// goes through text file
file << text.top();
text.pop(); //clears pop
}
system("pause");
}
Basically, I am reading .txt file to encrypt and then put a .iia file extension on the filename. Then I go back through, enter the file back with the .iia extension to decode it back. When I decode it back it is gibberish after about the first ten words.
#WhozCraig Does it matter what white space, newlines, or punctuation is in the file? Maybe with the full solution here you can direct me at what is wrong.
just for information: never read file char by char it will take you hours to finish 100Mb.
read at least 512 byte(in my case i read directly 1 or 2Mb ==> store in char * and then process).
If I understand what you're trying to do correctly, you want the entire file rotationally XOR'd with the chars in the cipher key. If that is the case, you can probably address your immediate error by simply doing this:
void readFileEncode(string fileName, stack<char> &text, string cypher)
{
std::ifstream file(fileName, std::ios::in|std::ios::binary);
stack<char> temp;
char ch;
while (file.get(ch))
temp.push(ch ^ cypher[temp.size() % cypher.length()]);
while (!temp.empty())
{
text.push(temp.top());
temp.pop();
}
}
The most notable changes are
Opening the file in binary-mode using std::ios::in|std::ios::binary for the open-mode. this will eliminate the need to invoke the noskipws manipulator (which is usually a function call) for every character extracted.
Using file.get(ch) to extract the next character. The member will pull the next char form the file buffer directly if one is available, otherwise load the next buffer and try again.
Alternative
A character by character approach is going to be expensive any way you slice it. That this is going through a stack<>, which will be backed by a vector or deque isn't going to do you any favors. That it is going through two of them just compounds the agony. You may as well load the whole file in one shot, compute all the XOR's directly, then push them on to you stack via a reverse iterator:
void readFileEncode
(
const std::string& fileName,
std::stack<char> &text,
const std::string& cypher
)
{
std::ifstream file(fileName, std::ios::in|std::ios::binary);
// retrieve file size
file.seekg(0, std::ios::end);
std::istream::pos_type pos = file.tellg();
file.seekg(0, std::ios::beg);
// early exit on zero-length file.
if (pos == 0)
return;
// make space for a full read
std::vector<char> temp;
temp.resize(static_cast<size_t>(pos));
file.read(temp.data(), pos);
size_t c_len = cypher.length();
for (size_t i=0; i<pos; ++i)
temp[i] ^= cypher[i % c_len];
for (auto it=temp.rbegin(); it!=temp.rend(); ++it)
text.push(*it);
}
You still get your stack on the caller-side, but I think you'll be considerably happier with the performance.

Reading a string from a file in C++

I'm trying to store strings directly into a file to be read later in C++ (basically for the full scope I'm trying to store an object array with string variables in a file, and those string variables will be read through something like object[0].string). However, everytime I try to read the string variables the system gives me a jumbled up error. The following codes are a basic part of what I'm trying.
#include <iostream>
#include <fstream>
using namespace std;
/*
//this is run first to create the file and store the string
int main(){
string reed;
reed = "sees";
ofstream ofs("filrsee.txt", ios::out|ios::binary);
ofs.write(reinterpret_cast<char*>(&reed), sizeof(reed));
ofs.close();
}*/
//this is run after that to open the file and read the string
int main(){
string ghhh;
ifstream ifs("filrsee.txt", ios::in|ios::binary);
ifs.read(reinterpret_cast<char*>(&ghhh), sizeof(ghhh));
cout<<ghhh;
ifs.close();
return 0;
}
The second part is where things go haywire when I try to read it.
Sorry if it's been asked before, I've taken a look around for similar questions but most of them are a bit different from what I'm trying to do or I don't really understand what they're trying to do (still quite new to this).
What am I doing wrong?
You are reading from a file and trying to put the data in the string structure itself, overwriting it, which is plain wrong.
As it can be verified at http://www.cplusplus.com/reference/iostream/istream/read/ , the types you used were wrong, and you know it because you had to force the std::string into a char * using a reinterpret_cast.
C++ Hint: using a reinterpret_cast in C++ is (almost) always a sign you did something wrong.
Why is it so complicated to read a file?
A long time ago, reading a file was easy. In some Basic-like language, you used the function LOAD, and voilĂ !, you had your file.
So why can't we do it now?
Because you don't know what's in a file.
It could be a string.
It could be a serialized array of structs with raw data dumped from memory.
It could even be a live stream, that is, a file which is appended continuously (a log file, the stdin, whatever).
You could want to read the data word by word
... or line by line...
Or the file is so large it doesn't fit in a string, so you want to read it by parts.
etc..
The more generic solution is to read the file (thus, in C++, a fstream), byte per byte using the function get (see http://www.cplusplus.com/reference/iostream/istream/get/), and do yourself the operation to transform it into the type you expect, and stopping at EOF.
The std::isteam interface have all the functions you need to read the file in different ways (see http://www.cplusplus.com/reference/iostream/istream/), and even then, there is an additional non-member function for the std::string to read a file until a delimiter is found (usually "\n", but it could be anything, see http://www.cplusplus.com/reference/string/getline/)
But I want a "load" function for a std::string!!!
Ok, I get it.
We assume that what you put in the file is the content of a std::string, but keeping it compatible with a C-style string, that is, the \0 character marks the end of the string (if not, we would need to load the file until reaching the EOF).
And we assume you want the whole file content fully loaded once the function loadFile returns.
So, here's the loadFile function:
#include <iostream>
#include <fstream>
#include <string>
bool loadFile(const std::string & p_name, std::string & p_content)
{
// We create the file object, saying I want to read it
std::fstream file(p_name.c_str(), std::fstream::in) ;
// We verify if the file was successfully opened
if(file.is_open())
{
// We use the standard getline function to read the file into
// a std::string, stoping only at "\0"
std::getline(file, p_content, '\0') ;
// We return the success of the operation
return ! file.bad() ;
}
// The file was not successfully opened, so returning false
return false ;
}
If you are using a C++11 enabled compiler, you can add this overloaded function, which will cost you nothing (while in C++03, baring optimizations, it could have cost you a temporary object):
std::string loadFile(const std::string & p_name)
{
std::string content ;
loadFile(p_name, content) ;
return content ;
}
Now, for completeness' sake, I wrote the corresponding saveFile function:
bool saveFile(const std::string & p_name, const std::string & p_content)
{
std::fstream file(p_name.c_str(), std::fstream::out) ;
if(file.is_open())
{
file.write(p_content.c_str(), p_content.length()) ;
return ! file.bad() ;
}
return false ;
}
And here, the "main" I used to test those functions:
int main()
{
const std::string name(".//myFile.txt") ;
const std::string content("AAA BBB CCC\nDDD EEE FFF\n\n") ;
{
const bool success = saveFile(name, content) ;
std::cout << "saveFile(\"" << name << "\", \"" << content << "\")\n\n"
<< "result is: " << success << "\n" ;
}
{
std::string myContent ;
const bool success = loadFile(name, myContent) ;
std::cout << "loadFile(\"" << name << "\", \"" << content << "\")\n\n"
<< "result is: " << success << "\n"
<< "content is: [" << myContent << "]\n"
<< "content ok is: " << (myContent == content)<< "\n" ;
}
}
More?
If you want to do more than that, then you will need to explore the C++ IOStreams library API, at http://www.cplusplus.com/reference/iostream/
You can't use std::istream::read() to read into a std::string object. What you could do is to determine the size of the file, create a string of suitable size, and read the data into the string's character array:
std::string str;
std::ifstream file("whatever");
std::string::size_type size = determine_size_of(file);
str.resize(size);
file.read(&str[0], size);
The tricky bit is determining the size the string should have. Given that the character sequence may get translated while reading, e.g., because line end sequences are transformed, this pretty much amounts to reading the string in the general case. Thus, I would recommend against doing it this way. Instead, I would read the string using something like this:
std::string str;
std::ifstream file("whatever");
if (std::getline(file, str, '\0')) {
...
}
This works OK for text strings and is about as fast as it gets on most systems. If the file can contain null characters, e.g., because it contains binary data, this doesn't quite work. If this is the case, I'd use an intermediate std::ostringstream:
std::ostringstream out;
std::ifstream file("whatever");
out << file.rdbuf();
std::string str = out.str();
A string object is not a mere char array, the line
ifs.read(reinterpret_cast<char*>(&ghhh), sizeof(ghhh));
is probably the root of your problems.
try applying the following changes:
char[BUFF_LEN] ghhh;
....
ifs.read(ghhh, BUFF_LEN);

How can ofstream write NULL to a file in binary mode?

I am maintaining a C++ method which one of my clients is hitting an issue with. The method is supposed to write out a series of identifiers to a file delimited by a new line. However on their machine somehow the method is writing a series of NULL's out to the file. Opening the file in a binary editor shows that it contains all zeros.
I can't understand why this is happening. I've tried assigning empty strings and strings with the first character set to 0. There is no problem creating the file, just writing the identifiers to it.
Here is the method:
void writeIdentifiers(std::vector<std::string> IDs, std::string filename)
{
std::ofstream out (filename.c_str(), std::ofstream::binary);
if (out.is_open())
{
for (std::vector<std::string>::iterator it = IDs.begin();
it != IDs.end();
it++)
{
out << *it << "\n";
}
}
out.close();
}
My questions: is there any possible input you can provide that method which will create a file which has NULL values in it?
Yeah, the following code quite clearly writes a series of NULL bytes:
std::vector<std::string> ids;
std::string nullstring;
nullstring.assign("\0\0\0\0\0\0\0\0\0\0", 10);
ids.push_back(nullstring);
writeIdentifiers(ids, "test.dat");
Because the std::string container stores the string length, it can't necessarily be used in the same way as an ordinary C (null-terminated) string. Here, I assign a string containing 10 NULL bytes. Those are then output because the string length is 10.

RapidXML weird parsing

I have a very annoying problem and I'm trying to solve it for lots of hours.
I'm using rapidXML with C++ to parse an XML file:
xml_document<> xmlin;
stringstream input; //initialized somewhere else
xmlin.clear();
xmlin.parse<0>(&(input.str()[0]));
cout << "input:" << input.str() << endl << endl;
xml_node<char> *firstnode = xmlin.first_node();
string s_type = firstnode->first_attribute("type")->value();
cout << "type: " << s_type << endl;
However I got this on the stdout:
input:<?xml version="1.0" encoding="utf-8"?><testxml command="testfunction" type="exclusive" />
type: exclusive" />
What could be the reason of this (printing the s_type variable)?
It's very annoying since I can't process the xml well.
Actually I found the solution.
Stringstream doesn't like when its content is getting modified (rapidXML does a fast in-situ parsing which means it modificates the contents of the array it gets).
However in the docs I read that string class does not like it either.
From the string::c_str documentation page:
the values in this array should not be modified in the program
But when I create a string from the stream it is working as it is expected:
xml_document<> xmlin;
stringstream input; //initialized somewhere else
string buffer = input.str()
xmlin.clear();
xmlin.parse<0>(&(buffer[0]));
I think the problem is in the code you haven't shown... Start by trying this, using a literal string - this works just fine for me...
xml_document<> xmlin;
char *input = "<?xml version=\"1.0\" encoding=\"utf-8\"?><testxml command=\"testfunction\" type=\"exclusive\" />";
xmlin.parse<0>(input);
xml_node<char> *firstnode = xmlin.first_node();
std::string s_type = firstnode->first_attribute("type")->value();
I would personally recommend this approach
xml_document<> doc;
string string_to_parse;
char* buffer = new char[str_to_parse.size() + 1];
strcpy (buffer, str_to_parse.c_str());
doc.parse<0>(buffer);
delete [] cstr;
making a non const char array out of the string you want to parse. I have always found this way safer and more reliable.
I used to do such crazy things as
string string_to_parse;
doc.parse<0>(const_cast<char*>(string_to_parse.c_str()));
and it "worked" for a long time (until the day it didn't when I needed to reuse the original string). Since RapidXML can modify the char array it is parsing and since it is not recommended to change str::string via c_str() I have always used the approach of copying my string to a non const char array and pass that to the parser. It may not be optimal and uses additional memory, but it is reliable and I have never had any errors or problems with it to date. Your data will be parsed and the original string can be reused without fear of it having been modified.