I created a C++ application that reads in XML files with the RapidXML parser. At one XML file that was shaped exactly the same as another one that worked, the parser threw an error:
"expected <"
The last five characters before the error were from the closing tag of the root element, so the error happened at the end-of-file:
</UW>
What I suspect this error to be related to, is a whitespace skipping bug being an issue with RapidXML v1.12 (I am using v1.13). I used no parsing flags (doc.parse<0>(bfr);).
According to this site, the bug was believed to be caused by faulty implementation of the "parse_trim_whitespace" parse flag. A patch was provided on that site, but there also seemed to be a problem with that patch.
The following is the XML document that caused this error. What I also don't understand - besides the reason for the error - is why the error didn't happen parsing another file with content of the same fashion. My application also successfully parses several other files before that file.
<?xml version="1.0" encoding="UTF-8"?>
<UW>
<Bez>EV005</Bez>
<Herst>Trumpf</Herst>
<Gesw>16</Gesw>
<Rad>1.6</Rad>
<Hoehe>100</Hoehe>
<Wkl>30</Wkl>
<BgVerf>Freibiegen</BgVerf>
<MaxBel>50</MaxBel>
<Kontur>0</Kontur>
<Grafik>0</Grafik>
</UW>
Part of my application were the error occours (this is the inside of a loop):
// Get "Bezeichnung" attribute
attr = subnode->first_attribute("Bezeichnung");
if ( !attr ){ err(ERR_FILE_INVALID,"Werkzeuge.xml"); return 0; }
name = attr->value();
// Get file name/URL
string fileName = name;
fileName.append(".xml");
// Open file
ifstream werkzeugFile(concatURL(PFAD_WERKZEUGE,fileName));
if(!werkzeugFile.is_open()) { err(ERR_FILE_NOTFOUND,fileName); return 0; }
// Get length
werkzeugFile.seekg(0,werkzeugFile.end);
int len = werkzeugFile.tellg();
werkzeugFile.seekg(0,werkzeugFile.beg);
// Allocate buffer
char * bfr = new char [len+1];
werkzeugFile.read(bfr,len);
werkzeugFile.close();
// Parse
SetWindowText(hwndProgress,"Parsing data: Werkzeuge/*.xml");
btmDoc.parse<0>(bfr);
// Get type of tool & check validity
xml_node<> *rt_node = btmDoc.first_node();
if ( strcmp(rt_node->name(),"OW") == 0 ){
isOW = true;
}
else if ( strcmp(rt_node->name(),"UW") == 0 ){
isUW = true;
}
else { err(ERR_FILE_INVALID,fileName); return 0; }
// Prepare for next loop iteration
delete[] bfr;
btmDoc.clear();
subnode = subnode->next_sibling();
Ah, I think I see it. Two things:
First, the ifstream is suspicious -- shouldn't it be opened in binary mode if you're jumping around in it using byte offsets (and somebody else is doing the parsing)? Passstd::ios::in | std::ios::binary as the second argument to the ifstream constructor.
Second, your memory management seems fine, except that you allocate one byte extra (the +1) but never seem to make use of it. I'm assuming you're missing bfr[len] = '\0'; after the contents are read in -- this explains the odd parse error at the end of the file, since the XML parser doesn't know it reached the end of the file -- it's parsing a null terminated string that isn't null terminated, and tries to parse random bytes of memory ;-)
Related
I'm attempting to write a lexer and parser but I'm having trouble getting the final variable in a text file due to in_file.tellg() equaling -1. My program only works if I add a space character after the variable, otherwise I get a compiler error. I want to mention that I'm able to get every other variable in the text file but the last one. I believe the cause of the problem is in_file.peek()!=EOF setting in_file.tellg() to -1.
My program is something like this:
ifstream in(file_name);
char c;
in >> noskipws;
while(in >> c ){
if(is_letter_part_of_variable(c)) {
int start_pos = in.tellg(),
end_pos,
length;
while(is_letter_part_of_variable(c) && in.peek()!=EOF ) {
in>>c;
}
end_pos = in.tellg(); // This becomes -1 for some reason
length = end_pos - start_pos; // Should be 7
// Reset file pointer to original position to chomp word.
in.clear();
in.seekg(start_pos-1, in.beg);
// The word 'message' should go in here.
char *identifier = new char[length];
in.read(identifier, length);
identifier[length] = '\0';
}
}
example.text
message = "Hello, World"
print message
I tried removing peek()!= EOF which gives me an eternal loop. I tried !in_file.eof() and that also makes tellg() equal to -1. What can I do to fix/enhance this code?
I believe the cause of the problem is in_file.peek()!=EOF setting in_file.tellg() to -1.
Close. peek attempts to read a character and returns EOF if it reads past the end of the stream. Reading past the end of a stream sets the stream's fail bit. tellg returns -1 if the fail bit is set.
Simple Solution
clear the fail bit before calling tellg.
Better solution
Use std::string.
std::string identifier;
while(in>>c && is_letter_part_of_variable(c)) {
identifier += c;
}
All of the messing around with peek, seekg, tellg and the dreaded new vanish.
I'm trying to create a program that encrypts files based on how Nazi Germany's Enigma machine worked, but without the flaw :P.
I have a function that gets a character at n point in a file, but when it returns a return character and I cout << it, it's like it hit enter twice.
IE if I loop cout-ing from i++ points in a file the individual lines in the terminal appear separated
by more returns
than one.
Here's the function:
char charN(string pathOf, int pointIn){
char r = NULL;
// NULL so I can tell when it doesn't return a character.
int sizeOf; //to store the found size of the file.
ifstream cf; //to store the Character Found.
ifstream siz; //used later to get the size of the file
siz.open(pathOf.c_str());
siz.seekg(0, std::ios::end);
sizeOf = siz.tellg(); // these get the length of the file and put it in sizeOf.
cf.open(pathOf.c_str());
if(cf.is_open() && pointIn < sizeOf){ //if not open, or if the character to get is farther out than the size of the file, let the function return the error condition: 'NULL'.
cf.seekg(pointIn); // move to the point in the file where the character should be, get it, and get out.
cf.get(r);
cf.close();
}
return r;
}
It works correctly if I use cout << '\n', but what's different about returns from a file and '\n'?
Or is there something else I'm missing?
I've been googling about but I can't find anything remotely similar to my problem, thanks in advance.
I'm using Code::Blocks 13.12 as my compiler if that matters.
Is this is on a windows machine? In windows new lines in text files are representing by \r\n.
\r = carriage return
\n = line feed
It's possible that you are couting each one separately and that the output buffer is creating a new line for each one.
EDIT: Problem solved! Turns out Windows 7 wont let me read/ write to files without explicitly running as administrator. So if i run as admin it works fine, if i dont i get the weird results i explain below.
I've been trying to get a part of a larger program of mine to read a file.
Despite trying multiple methods(istream::getline, std::getline, using the >> operator etc) All of them return with either /0, blank or a random number/what ever i initialised the var with.
My first thought was that the file didn't exist or couldn't be opened, however the state flags .good, .bad and .eof all indicate no problems and the file im trying to read is certainly in the same directory as the debug .exe and contains data.
I'd most like to use istream::getline to read lines into a char array, however reading lines into a string array is possible too.
My current code looks like this:
void startup::load_settings(char filename[]) //master function for opening a file.
{
int i = 0; //count variable
int num = 0; //var containing all the lines we read.
char line[5];
ifstream settings_file (settings.inf);
if (settings_file.is_open());
{
while (settings_file.good())
{
settings_file.getline(line, 5);
cout << line;
}
}
return;
}
As said above, it compiles but just puts /0 into every element of the char array much like all the other methods i've tried.
Thanks for any help.
Firstly your code is not complete, what is settings.inf ?
Secondly most probably your reading everything fine, but the way you are printing is cumbersome
cout << line; where char line[5]; be sure that the last element of the array is \0.
You can do something like this.
line[4] = '\0' or you can manually print the values of each element in array in a loop.
Also you can try printing the character codes in hex for example. Because the values (character codes) in array might be not from the visible character range of ASCII symbols. You can do it like this for example :
cout << hex << (int)line[i]
UPDATE: Yes, answered and solved. I also then managed to find the issue with the output that was the real problem I was having. I had thought the substring error was behind it, but I was wrong, as when that had been fixed, the output issue persisted. I found that it was a simple mix up in the calculations. I had been subtracting 726 instead of 762. I could've had this done hours ago... Lulz. That's all I can say... Lulz.
I am teaching myself C++ (with the tutorial from their website). I have jumped ahead time to time when I have needed to do something I cannot with what I have learned so far. Additionally, I wrote this relatively quickly. So, if my code looks inelegant or otherwise unacceptable at a professional level, please do excuse that for now. My only current purpose is to get this question answered.
This program takes each line of a text file I have. Note that the text file's lines look like this:
.123.456.789
It has 366 lines. The program I first wrote to deal with this had me input each of the three numbers for each line manually. As I'm sure you can imagine, that was extremely inefficient. This program's purpose is to take each number out of the text file and perform functions and output the results to another text file. It does this per line until it reaches the end of the file.
I have read up more on what could cause this error, but I cannot find the cause of it in my case. Here is the bit of the code that I believe to contain the cause of the problem:
int main()
{
double a;
double b;
double c;
double d;
double e;
string search; //The string for lines fetched from the text file
string conversion;
string searcha; //Characters 1-3 of search are inserted to this string.
string searchb; //Characters 5-7 of search are inserted to this string.
string searchc; //Characters 9-11 of search are inserted to this string.
string subsearch; //Used with the substring to fetch individual characters.
string empty;
fstream convfil;
convfil.open("/home/user/Documents/MPrograms/filename.txt", ios::in);
if (convfil.is_open())
{
while (convfil.good())
{
getline(convfil,search); //Fetch line from text file
searcha = empty;
searchb = empty;
searchc = empty;
/*From here to the end seems to be the problem.
I provided code from the beginning of the program
to make sure that if I were erring earlier in the code,
someone would be able to catch that.*/
for (int i=1; i<4; ++i)
{
subsearch = search.substr(i,1);
searcha.insert(searcha.length(),subsearch);
a = atof(searcha.c_str());
}
for (int i=5; i<8; ++i)
{
subsearch = search.substr(i,1);
searchb.insert(searchb.length(),subsearch);
b = atof(searchb.c_str());
}
for (int i=9; i<search.length(); ++i)
{
subsearch = search.substr(i,1);
searchc.insert(searchc.length(),subsearch);
c = atof(searchc.c_str());
}
I usually teach myself how to get around these issues when they come up by looking at references and problems other people may have had, but I couldn't find anything that helped me in this instance. I have tried numerous variations upon this, but as the issue has something to do with the substring and I couldn't get rid of the substring in any of these variations, all returned the same error and the same result in the output file.
This is a problem:
while (convfil.good()) {
getline(convfil,search); //Fetch line from text file
You test for failure before you do the operation that can fail. When getline does fail, you're already inside the loop.
As a result, your code tries to process an invalid record at the end.
Instead try
while (getline(convfil,search)) { //Fetch line from text file
or even
while (getline(convfil,search) && search.length() > 9) {
which will also stop without error if there's a blank line at the end of the file.
It's possible you are reading a blank line at the end of the file and trying to process it.
Test for an empty string before processing it.
So I have a binary file that I create and initialize. If I set my pointer to seekg = 0 or seekp = 0, then I can overwrite the line of text fine. However if I jump ahead 26 bytes (the size of one line of my file and something I have certainly confirmed), it refuses to overwrite. Instead it just adds it before the binary data and pushes the old data further onto the line. I want the data completely overwritten.
char space1[2] = { ',' , ' '};
int main()
{
CarHashFile lead;
lead.createFile(8, cout);
fstream in;
char* tempS;
tempS = new char[25];
in.open("CarHash.dat", ios::binary | ios::in | ios::out);
int x = 2000;
for(int i = 0; i < 6; i++)
tempS[i] = 'a';
int T = 30;
in.seekp(26); //Start of second line
in.write(tempS, 6); //Will not delete anything, will push
in.write(space1, sizeof(space1)); //contents back
in.write((char *)(&T), sizeof(T));
in.write(space1, sizeof(space1));
in.write(tempS,6);
in.write(space1, sizeof(space1));
in.write((char *)&x, sizeof(x));
//Now we will use seekp(0) and write to the first line
//it WILL overwrite the first line perfectly fine
in.seekp(0);
in.write(tempS, 6);
in.write((char*) &x, sizeof(x));
in.write(tempS, 6);
in.write((char *) &T, sizeof(T));
return 0;
}
The CarHashFile is an outside class that creates a binary file full of the following contents when create file is invoked: "Free, " 1900 ", Black, $" 0.00f.
Everything enclosed in quotes was added as a string, 1900 as an int, and 0.00f as a float obviously. I added all of these through write, so I'm pretty sure it's an actual binary file, I just don't know why it only chooses to write over the first line. I know the file size is correct because if I set seekp = 26 it will print at the beginning of the second line and push it down. space was created to easily add the ", " combo to the file, there is also a char dol[1] = '$' array for simplicity and a char nl[1] = '\n' that lets me add a new line to the binary file (just tried removing that binary add and it forced everything onto one row, so afaik, its needed).
EDIT: Ok so, it was erasing the line all along, it just wasn't putting in a new line (kind of embarrassing). But now I can't figure out how to insert a newline into the file. I tried writing it the way I originally did with char nl[1] = { '\n' }. That worked when I first created the file, but won't afterwards. Are there any other ways to add lines? I also tried in << endl and got nothing.
I suggest taking this one step at a time. the code looks OK to me, but lack of error checking will mean any behavior could be happening.
Add error checks and reporting to all operations on in.
If that shows no issues, do a simple seek then write
result = in.pseek(26);
//print result
result = in.write("Hello World",10);
// print result
in.close();
lets know what happens
The end problem wasn't my understand of file streams. It was my lack of understanding of binary files. The newline screwed everything up royally, and while it could be added fine at one point in time, dealing with it later was a huge hassle. Once I removed that, everything else fell into place just fine. And the reason a lot of error checking or lack of closing files is there is because its just driver code. Its as bare bones as possible, I really didn't care what happened to the file at that point in time and I knew it was being opened. Why waste my time? The final version has error checks, when the main program was rewritten. And like I said, what I didn't get was binary files, not file streams. So AJ's response wasn't very useful, at all. And I had to have 25 characters as part of the assignment, no name is 25 characters long, so it gets filled up with junk. Its a byproduct of the project, nothing I can do about it, other than try and fill it with spaces, which just takes more time than skipping ahead and writing from there. So I chose to write what would probably be the average name (8 chars) and then just jump ahead 25 afterwards. The only real solution I could say that was given here was from Emile, who told me to get a Hex Editor. THAT really helped. Thanks for your time.