wistringstream from an xml file to an integer? - c++

const XMLDataNode *pointsNode = node->GetChildren().at(0);
std::wistringstream pointsstrm(*pointsNode->GetInnerText());
pointsstrm >> loadedGame.points;
This is code I've written to pull an int from an XML file and pass it into loadedGame.points (an int). However, this isn't working. It compiles but doens't give the right value. Why is that? XMLDataNode is a class that manipulates xmllite.dll.

Time for some wild guesses!
I'll bet you that the text you get from *pointsNode->GetInnerText() isn't what you think it is. Have you checked that it is indeed exactly the text you want? In particular, could it contain whitespace? Parsing a nicely formatted (i.e. indented, broken into lines, etc) XML file without a schema to reference ends up meaning that all sorts text nodes involving whitespace will end up in your DOM tree.

Related

How to input an arbitrary number of text files in C++?

so I'm working on a coding project for a class, and I understand the basic things I want to accomplish, but one thing that nobody seems to be able to help me with is inputting an unspecified number of text files. The user is prompted to enter the text files they want to compare (overall purpose of my code), separated by spaces, thus allowing them to compare an arbitrary amount of text files (eg. 2, 3, 8, 16, etc). I know that the getline function is helpful here, as well as searching for the number of "." because files can only contain one ".", all within a for loop. After that logic I am utterly lost. Eventually, I'm going to have to open the text files and put them in sets to compare them against every other file once, and output their similarities and differences into yet another text file. Any ideas?
Here is the general process I would try to follow (if I interpreted the prompt correctly)
Get the line of text files using getline
Put that into a stringstream
Open the next file in the stream while there is still information in the stringstream (not at eof)
Store all of that information in a Vector of strings, each new file just appended on after it is read
compare strings in the vector
If you pass the text files on the commandline rather than getting them from a little dialog with the user via stdin life will be easier. Most users will type
compare *
which on Unix type systems is expanded to a list of files. ON DOS you need to match and expand the wild card yourself.
You've got an N squared problem, but the logic is easy, it's just
int mian(int argc, char **argv)
{
int i, j;
for(i=1;i<argc;i++)
for(j=i+1;j>argc;j++)
compare(argv[i], argv[j];
}

Boost property tree (XML) remove blank lines

I am using boost::property_tree to read and write xml configuration files. I want to change the value of some tags in my code and write them back to file, with some reasonable xml formatting (new lines, indenting, etc.).
Currently I am using
std::fstream fs("filename");
boost::property_tree::ptree pt;
bpt::xml_parser::read_xml(fs,pt);
// replace value
pt.erase("tagname");
pt.put("tagname",newval);
bpt::xml_parser::xml_writer_settings<char> xmlstyle(' ',4);
bpt::xml_parser::write_xml("filename",pt,std::locale(),xmlstyle);
But it seems that every time a tag is deleted, it leaves behind a blank line and after some iterations the xml becomes unreadable. Is there a way to remove empty lines from the property tree itself or from the resulting xml file using boost?
I know there are other ways of removing the newlines by reading and parsing the entire file again, but I was hoping for a more convenient one-liner.
Ok, it looks like the answer was already out there on Stack Overflow, I just hadn't found it (newlines were not mentioned in the post)
boost::property_tree XML pretty printing
The solution is to read the file with boost::property_tree::xml_parser::trim_whitespace
Not Boost, but blank lines in particular.
You can use std::regex_replace() on the output before it is written to the file, removing the blank lines, like this
std::regex_replace(std::ostreambuf_iterator<char>(fout), text.begin(), text.end(), std::regex("(\\n+)"), "\n");
With fout as the file output stream and text as the output data as a std::string.
This replaces every newline followed by another newline w/o any characters in between with a single newline.

Extract specific portion of HTML file using c++/boost::regex

I have a series of thousands of HTML files and for the ultimate purpose of running a word-frequency counter, I am only interested on a particular portion from each file. For example, suppose the following is part of one of the files:
<!-- Lots of HTML code up here -->
<div class="preview_content clearfix module_panel">
<div class="textelement "><div><div><p><em>"Portion of interest"</em></p></div>
</div>
<!-- Lots of HTML code down here -->
How should I go about using regular expressions in c++ (boost::regex) to extract that particular portion of text highlighted in the example and put that into a separate string?
I currently have some code that opens the html file and reads the entire content into a single string, but when I try to run a boost::regex_match looking for that particular beginning of line <div class="preview_content clearfix module_panel">, I don't get any matches. I'm open to any suggestions as long as it's on c++.
How should I go about using regular expressions in c++ (boost::regex) to extract that particular portion of text highlighted in the example and put that into a separate string?
You don't.
Never use regular expressions to process HTML. Whether in C++ with Boost.Regex, in Perl, Python, JavaScript, anything and anywhere. HTML is not a regular language; therefore, it cannot be processed in any meaningful way via regular expressions. Oh, in extremely limited cases, you might be able to get it to extract some particular information. But once those cases change, you'll find yourself unable to get done what you need to get done.
I would suggest using an actual HTML parser, like LibXML2 (which does have the ability to read HTML4). But using regex's to parse HTML is simply using the wrong tool for the job.
Since all I needed was something quite simple (as per question above), I was able to get it done without using regex or any type of parsing. Following is the code snippet that did the trick:
// Read HTML file into string variable str
std::ifstream t("/path/inputFile.html");
std::string str((std::istreambuf_iterator<char>(t)), std::istreambuf_iterator<char>());
// Find the two "flags" that enclose the content I'm trying to extract
size_t pos1 = str.find("<div class=\"preview_content clearfix module_panel\">");
size_t pos2 = str.find("</em></p></div>");
// Get that content and store into new string
std::string buf = str.substr(pos1,pos2-pos1);
Thank you for pointing out the fact that I was totally on the wrong track.

getline() text with UNIX formatting characters

I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.
From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.

Incorporating text files in applications?

Is there anyway I can incorporate a pretty large text file (about 700KBs) into the program itself, so I don't have to ship the text files together in the application directory ? This is the first time I'm trying to do something like this, and I have no idea where to start from.
Help is greatly appreciated (:
Depending on the platform that you are on, you will more than likely be able to embed the file in a resource container of some kind.
If you are programming on the Windows platform, then you might want to look into resource files. You can find a basic intro here:
http://msdn.microsoft.com/en-us/library/y3sk7e6b.aspx
With more detailed information here:
http://msdn.microsoft.com/en-us/library/zabda143.aspx
Have a look at the xxd command and its -include option. You will get a buffer and a length variable in a C formatted file.
If you can figure out how to use a resource file, that would be the preferred method.
It wouldn't be hard to turn a text file into a file that can be compiled directly by your compiler. This might only work for small files - your compiler might have a limit on the size of a single string. If so, a tiny syntax change would make it an array of smaller strings that would work just fine.
You need to convert your file by adding a line at the top, enclosing each line within quotes, putting a newline character at the end of each line, escaping any quotes or backslashes in the text, and adding a semicolon at the end. You can write a program to do this, or it can easily be done in most editors.
This is my example document:
"Four score and seven years ago,"
can be found in the file c:\quotes\GettysburgAddress.txt
Convert it to:
static const char Text[] =
"This is my example document:\n"
"\"Four score and seven years ago,\"\n"
"can be found in the file c:\\quotes\\GettysburgAddress.txt\n"
;
This produces a variable Text which contains a single string with the entire contents of your file. It works because consecutive strings with nothing but whitespace between get concatenated into a single string.