I've been having an issue with parsing through a file and the use of seekg(). Whenever a certain character is reached in a file, I want to loop until a condition is met. The loops works fine for the first iteration but, when it loops back, the file seemingly skips a character and causes the loop to not behave as expected.
Specifically, the loop works fine if it is all contained in one line in the file, but fails when there is at least one newline within the loop in the file.
I should mention I am working on this on Windows, and I feel like the issue arises from how Windows ends lines with \r\n.
Using seekg(-2, std::ios::cur) after looping back fixes the issue when the beginning loop condition is immediately followed by a newline, but does not work for a loop contained in the same line.
The code is structured by having an Interpreter class hold the file pointer and relevant variables, such as the current line and column. This class also has a functional map defined like so:
// Define function type for command map
typedef void (Interpreter::*function)(void);
// Map for all the commands
std::map<char, function> command_map = {
{'+', increment_cell},
{'-', decrement_cell},
{'>', increment_ptr},
{'<', decrement_ptr},
{'.', output},
{',', input},
{'[', begin_loop},
{']', end_loop},
{' ', next_col},
{'\n', next_line}
};
It iterates through each character, deciding if it has functionality or not in the following function:
// Iterating through the file
void Interpreter::run() {
char current_char;
if(!this->file.eof() && this->file.good()) {
while(this->file.get(current_char)) {
// Make sure character is functional command (ie not a comment)
if(this->command_map.find(current_char) != this->command_map.end()) {
// Print the current command if in debug mode
if(this->debug_mode && current_char != ' ' && current_char != '\n') {
std::cout << this->filename << ":" << this->line << ":"
<< this->column << ": " << current_char << std::endl;
}
// Execute the command
(this->*(command_map[current_char]))();
}
// If it is not a functional command, it is a comment. The rest of the line is ignored
else{
std::string temp_line = "";
std::getline(file, temp_line);
this->line++;
this->column = 0;
}
this->temp_pos = file.tellg();
this->column++;
}
}
else {
std::cout << "Unable to find file " << this->filename << "." << std::endl;
exit(1);
}
file.close();
}
The beginning of the loop (signaled by a '[' char) sets the beginning loop position to this->temp_pos:
void Interpreter::begin_loop() {
this->loop_begin_pointer = this->temp_pos;
this->loop_begin_line = this->line;
this->loop_begin_col = this->column;
this->run();
}
When the end of the loop (signaled by a ']' char) is reached, if the condition for ending the loop is not met, the file cursor position is set back to the beginning of the loop:
void Interpreter::end_loop() {
// If the cell's value is 0, we can end the loop
if(this->char_array[this->char_ptr] == 0) {
this->loop_begin_pointer = -1;
}
// Otherwise, go back to the beginning of the loop
if(this->loop_begin_pointer > -1){
this->file.seekg(this->loop_begin_pointer, std::ios::beg);
this->line = this->loop_begin_line;
this->column = this->loop_begin_col;
}
}
I was able to put in debugging information and can show stack traces for further clarity on the issue.
Stack trace with one line loop ( ++[->+<] ):
+ + [ - > + < ] [ - > + < ] done.
This works as intended.
Loop with multiple lines:
++[
-
>
+<]
Stack trace:
+ + [ - > + < ] > + < ] <- when it looped back, it "skipped" '[' and '-' characters.
This loops forever since the end condition is never met (ie the value of the first cell is never 0 since it never gets decremented).
Oddly enough, the following works:
++[
-
>+<]
It follows the same stack trace as the first example. This working and the last example not working is what has made this problem hard for me to solve.
Please let me know if more information is needed about how the program is supposed to work or its outputs. Sorry for the lengthy post, I just want to be as clear as possible.
Edit 1:
The class has the file object as std::ifstream file;.
In the constructor, it is opened with
this->file.open(filename), where filename is passed in as an argument.
For a file stream, seekg is ultimately defined in terms of fseek from the C standard library. The C standard has this to say:
7.21.9.2/4 For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
So for a file opened in text mode, you can't do any arithmetic on offsets. You can rewind to the beginning, position at the end, or return to the position you were at previously and captured with tellg (which ultimately calls ftell). Anything else would exhibit undefined behavior.
Related
I'm trying to learn how to use file handling in C++.
I'd like to save and edit the contents of variables, however, there seems to be a problem with my logic because somehow the following program re writes the contents of the file infinitely.
#include <stdio.h>
#include <iostream>
int main(int argc, char const *argv[])
{
FILE *filePointer = fopen("text.txt", "r+");
char currentChar;
int loop = 0;
currentChar = getc(filePointer);
while (currentChar != EOF && loop < 100)
{
if (currentChar == '=')
{
fseek(filePointer, 1, 1);
if (fputs("LOL", filePointer) == EOF)
{
return 2;
}
}
std::cout << ftell(filePointer) << "Current Char: " << currentChar << std::endl;
currentChar = getc(filePointer);
loop++;
}
fclose(filePointer);
return 0;
}
The code that the program reads is the following:
"
hello = \n
yay!
"
This program exhibits undefined behavior. From C99 standard (which governs C standard library functions and is incorporated into the C++ standard by reference):
7.19.5.3/6 When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
You have output (fputs) immediately followed by input (getc) without an intervening fflush or a file positioning function.
Text is stored contiguously on disk, and just like arrays, if you want to insert it, you must move the original element back, otherwise the effect of the execution is an overlay
I am making a device that moves back and fourth and needs to store its last position so that upon power up, the last stored value can be grabbed from the last line of the file on an SD card, and it can resume operation. This file will then be destroyed and re-written. For this particular application homing and other methods can not be used because it must start in the spot it last was. Due to position tracking via encoder, there is no positional memory otherwise.The file is setup to be a single data column seperated by commas.
Currently I am successfully writing to the SD card as position changes, and reading the entire file to be printed on the Serial monitor. However, I really only need the last value. The length of the file will always be different do to system operation.
I have read a lot of different solutions but none of them seem to work for my application.
I can read the entire file using:
void read_file() {
// open the file for reading:
myFile = SD.open("test8.txt");
if (myFile) {
Serial.println("test8.txt:");
// read from the file until there's nothing else in it:
// read from the file until there's nothing else in it:
while (myFile.available()) {
String a = "";
for (int i = 0; i < 9; ++i)
{
int j;
char temp = myFile.read();
if (temp != ',' && temp != '\r')
{ //a=temp;
a += temp;
}
else if (temp == ',' || temp == '\r') {
j = a.toInt();
// Serial.println(a);
Serial.println(j);
break;
}
}
}
// close the file:
myFile.close();
} else {
// if the file didn't open, print an error:
Serial.println("error opening test8.txt");
}
}
This gives me a stream of the values separated by 0 like this:
20050
0
20071
0
20092
0
20113
0
20133
0
Ideally I just need 20133 to be grabbed and stored as an int.
I have also tried:
void read_file_3() {
// open the file for reading:
myFile = SD.open("test8.txt");
if (myFile) {
Serial.println("test8.txt:");
// read from the file until there's nothing else in it:
Serial.println(myFile.seek(myFile.size()));
// close the file:
myFile.close();
} else {
// if the file didn't open, print an error:
Serial.println("error opening test.txt");
}
}
This only returns "1", which does not make any sense to me.
Update:
I have found a sketch that does what I want, however it is very slow due to the use of string class. Per post #6 here: https://forum.arduino.cc/index.php?topic=379209.0
This does grab the last stored value, however it takes quite awhile as the file gets bigger, and may blow up memory.
How could this be done without the string class?
void read_file() {
// open the file for reading:
myFile = SD.open("test8.txt");
if (myFile) {
while (myFile.available())
{
String line_str = myFile.readStringUntil(','); // string lavue reading from the stream - from , to , (coma to comma)
int line = line_str.toInt();
if (line != 0) // checking for the last NON-Zero value
{
line2 = line; // this really does the trick
}
// Serial.print(line2);
// delay(100);
}
Serial.print("Last line = ");
Serial.print(line2);
// close the file:
myFile.close();
// SD.remove("test3.txt");
} else {
// if the file didn't open, print an error:
Serial.println("error opening test.txt");
}
}
Any help would be greatly appreciated!
seek returns true if it succesffuly goes to that position and false if it does not find anything there, like for instance if the file isn't that big. It does not give you the value at that position. That's why you see a 1, seek is returning true that it was able to go to the position (myFile.size()) and that's what you're printing.
Beyond that, you don't want to go to the end of the file, that would be after your number. You want to go to a position 5 characters before the end of the file if your number is 5 digits long.
Either way, once you seek that position, then you still need to use read just like you did in your first code to actually read the number. seek doesn't do that, it just takes you to that position in the file.
EDIT: Since you edited the post, I'll edit the answer to go along. You're going backwards. You had it right the first time. Use the same read method you started with, just seek the end of the file before you start reading so you don't have to read all the way through. You almost had it. The only thing you did wrong the first time was printing what you got back from seek instead of seeking the right position and then reading the file.
That thing you looked up with the String class is going backward from where you were. Forget you ever saw that. It's doing the same thing you were already doing in the first place only it's also wasting a lot of memory and code space in the process.
Use your original code and just add a seek to skip to the end of the file.
This assumes that it's always a 5 digit number. If not then you may need a little bit of tweaking:
void read_file() {
// open the file for reading:
myFile = SD.open("test8.txt");
if (myFile) {
Serial.println("test8.txt:");
/// ADDED THIS ONE LINE TO SKIP MOST OF THE FILE************
myFile.seek(myFile.size() - 5);
// read from the file until there's nothing else in it:
// read from the file until there's nothing else in it:
while (myFile.available()) {
String a = "";
for (int i = 0; i < 9; ++i)
{
int j;
char temp = myFile.read();
if (temp != ',' && temp != '\r')
{ //a=temp;
a += temp;
}
else if (temp == ',' || temp == '\r') {
j = a.toInt();
// Serial.println(a);
Serial.println(j);
break;
}
}
}
// close the file:
myFile.close();
} else {
// if the file didn't open, print an error:
Serial.println("error opening test8.txt");
}
}
See, all I've done is take your original function and add a line to seek the end to it.
I have a program that is given a file of a shape and reads it line by line to be able to put the shape into a 2D array. It's of an unknown size, so I have to count the rows as we go. Everything works fine, except I'm having trouble getting it to stop when there are empty lines trailing the input.
The piece of code in question is below:
while(cin.eof() != true){
getline(cin, input);
shape = shape + input;
rows++;
}
For example this will count 3 rows:
===
===
===
and this counts 4:
===
===
===
(empty line)
I need my program to ignore the empty lines, regardless of how many there are.
I've tried quite a few different things such as
if (!input.empty()){
shape = shape + input;
rows++;
}
or
if (input != " " && input[0] != '\0' && input[0] != '\n'){
shape = shape + input;
rows++;
}
These work if there is only one empty line, but if I had multiple empty lines it will only not count the very last one.
Shape and Input are both strings.
You have made a good choice to read a line at a time with getline(), but you are controlling your read loop incorrectly. See Why !.eof() inside a loop condition is always wrong.
Instead always control the continuation of your read loop based on the stream state resulting from the read function itself. In your case, you ignore the state after getline() and assume you have valid input -- which you won't when you read EOF. Why?
When you read the last line in your file, you will have read input, but the eofbit will not yet be set as you haven't reached the end-of-file yet. You loop checking cin.eof() != true (which it isn't yet) and then call getline (cin, input) BAM! Nothing was read and eofbit is now set, yet you blindly assign shape = shape + input; even though your read with getline() failed.
Your second issue is how do you skip empty lines? Simple. If input.size() == 0 the line was empty. To "skip" empty lines, just continue and read the next. To "quit reading" when the first empty line is reached, replace continue with break.
A short example incorporating the changes above would be:
#include <iostream>
#include <string>
int main (void) {
std::string input{}, shape{};
std::size_t rows = 0;
while (getline(std::cin, input)) { /* control loop with getline */
if (input.size() == 0) /* if .size() == 0, empty line */
continue; /* get next */
shape += input; /* add input to shape */
rows++; /* increment rows */
}
std::cout << rows << " rows\n" << shape << '\n';
}
Also see: Why is “using namespace std;” considered bad practice? and avoid developing habits that will be harder to break later.
Example Use/Output
$ cat << eof | ./bin/inputshape
> ===
> ===
> ===
> eof
3 rows
=========
With a blank line at the end
$ cat << eof | ./bin/inputshape
> ===
> ===
> ===
>
> eof
3 rows
=========
Or with multiple blank lines:
$ cat << eof | ./bin/inputshape
> ===
> ===
> ===
>
>
> eof
3 rows
=========
(note: the eof used in input above is simply the heredoc sigil marking the end of input and has no independent significance related to the stream state eofbit or .eof(). It could just as well have been banannas, but EOF or eof are generally/traditionally used. Also, if you are not using bash or another shell supporting a heredoc, just redirect a file to the program, e.g. ./bin/inputshape < yourfile)
Look things over and let me know if you have further questions.
Edit Based On No Use of continue or break
If you can't use continue or break, then just turn the conditional around and only add to shape if input.size() != 0. For example:
while (getline(std::cin, input)) { /* control loop with getline */
if (input.size() != 0) { /* if .size() != 0, good line */
shape += input; /* add input to shape */
rows++; /* increment rows */
}
}
Exact same thing, just written a bit differently. Let me know if that works for you.
I have more than one input files like this:
>1aab_
GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKT
MSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>1j46_A
MQDRVKRPMNAFIVWSRDQRRKMALENPRMRNSEISKQLGYQWKMLTEAE
KWPFFQEAQKLQAMHREKYPNYKYRPRRKAKMLPK
>1k99_A
MKKLKKHPDFPKKPLTPYFRFFMEKRAKYAKLHPEMSNLDLTKILSKKYK
ELPEKKKMKYIQDFQREKQEFERNLARFREDHPDLIQNAKK
>2lef_A
MHIKKPLNAFMLYMKEMRANVVAESTLKESAAINQILGRRWHALSREEQA
KYYELARKERQLHMQLYPGWSARDNYGKKKKRKREK
Here, what I have to do:
vector <string> names;
vector <string> seqs;
names.resize(total); //"total" is already known.
seqs.resize(total);
counter=0;char input;
while ((input = myInput.get()) != EOF)
{
if(input=='>')
names[counter]= take all line (>1aab_, >1j46_A, so...)
else
untill the see next '>' append the character into sequence[counter]
counter++;
}
Finally it will be like this:
names[0]=">1aab_"
sequence[0]="GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE"
and so on..
I am thinking about for 2 hours and I couldn't figure out it. Can anyone help about that? Thanks in advance.
There's a few ways to solve it; I'll present some examples but I'm not testing/compiling this code, so there may be minor bugs - the logic is the important bit.
Since your pseudocode appears to be processing the input character by character, I've taken that as a requirement.
The way you seem to be thinking about it would be implemented with essentially a pair of loops - one for reading the name, the other for reading the sequence - which are enclosed in an outer loop, in order to process all records.
This would look something like the following:
// first character in file should be a '>', indicating the start
// of a record.
input = myInput.get();
if (input != '>')
{
std::cerr << "Malformed input file!" << std::endl;
return /*...*/;
}
do
{
// record name continues up until the newline
while ((input = myInput.get()) != EOF)
{
if (input == '\n' || input == '\r')
break;
names[counter].push_back(input);
}
// read sequence until we hit a '>' or EOF
while ((input = myInput.get()) != EOF)
{
if (input == '>')
{
// advance to next record number
counter++;
break;
}
sequence[counter].push_back(input);
}
} while (input != EOF && counter < total);
You'll also notice I moved the check for the initial '>' to before the loop, just as a way of ingesting (and discarding) the character, as well as a basic sanity check of the input. This is because we really use this character to mark the end of the sequence (rather than the "start of a record") - when we enter the loop, we assume we're already reading the record name.
Another way to approach it is to use a state machine. Essentially, this utilises additional variables to track the state the parser is in.
For this particular case, you only have two states: either you're reading a record name, or the sequence. So, we can just use a single boolean to track which state we're in.
Armed with the state variable, we can then make decisions about what to do with the character we just read based upon the state we're in. At the simplest level here, if we're in "read the record name" state, we add the character to the names variable, otherwise we add it to the sequence variable.
// state flag to indicate if we're currently reading a name line,
// i.e. a line starting with ">"
// This should be set true by the first record we encounter, so
// we'll set it false (to indicate we're reading a sequence) in
// order to allow us to detect bad input files.
bool reading_name = false;
// indicate we're on the first record, so we can avoid incrementing
// the record counter
bool first_record = true;
// process input character-by-character until end of file
while ((input = myInput.get()) != EOF)
{
// check for start of new record
if (input == '>')
{
// for robustness, verify we're not already reading a name,
// as this probably indicates invalid input
if (reading_name)
{
std::cerr << "Input is malformed?!" << endl;
break;
}
// switch to reading name state
reading_name = true;
// advance to next record, but only if it isn't the first record
if (first_record)
{
// disable the first_record flag, and explicitly set the
// record counter to 0.
first_record = false;
counter = 0;
}
else if (++counter >= total)
{
std::cerr << "Error: too many records!" << std::endl;
break;
}
}
// first character in file should start a new record
else if (first_record)
{
std::cerr << "Missing record start character at beginning of input!" << std::endl;
break;
}
// make sure we are processing a valid record number
else if (counter >= total)
{
std::cerr << "Invalid record number!" << std::endl;
break;
}
// continue reading the name
else if (reading_name)
{
// check if we've reached the end of the line; you
// may also want/need to check for \r if your input
// files may have Windows-style line endings
if (input == '\n')
{
// switch to reading sequence state
reading_name = false;
}
else
{
// add character to current name
names[counter].push_back(input);
}
}
// continue reading the sequence
else
{
// you might need to handle line ending characters here,
// maybe just skipping them?
// add character to current sequence
sequence[counter].push_back(input);
}
}
This adds a fair amount of complexity, which is of questionable value for this particular exercise, but does make adding additional states easier in future. It also has the benefit of only a single place in the code where I/O is done, which reduces the chances of errors (not checking for EOF, overflow array bounds, etc.).
In this case, we're actually using the '>' character as an indicator that a new record is starting, so we add a bit of extra logic to make sure that all works properly with the record counter. You could also just use a signed integer for your counter variable and start it at -1, so it will increment to 0 at the start of the first record, but using signed variables to index into arrays isn't a good idea.
There are more ways to approach this problem, but hopefully this gives you somewhere to start on your own solution.
While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.
Eventually I did a sanity check; even if I just do this...
int main()
{
std::ifstream dataFile("myfile.txt",
std::ifstream::in);
if (dataFile.is_open() && !dataFile.fail())
{
while (dataFile.good())
{
std::string line;
dataFile.seekg(dataFile.tellg());
std::getline(dataFile, line);
}
}
}
...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?
This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.
The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte \r\n endline is converted to \n, windows also seems to append a spurious newline to the end of the file).
lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).
The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
For a file with a single a character I get the following output
2 3
Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.
Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f;
f.rdbuf()->pubsetbuf(nullptr, 0);
f.open("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
but this is far from ideal.
Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.
Thanks for this , though it's a very old post. I was stuck on this problem for more then a week. Here's some code examples on my site (the menu versions 1 and 2). Version 1 uses the solution presented here, in case anyone wants to see it .
:)
void customerOrder::deleteOrder(char* argv[]){
std::fstream newinFile,newoutFile;
newinFile.rdbuf()->pubsetbuf(nullptr, 0);
newinFile.open(argv[1],std::ios_base::in);
if(!(newinFile.is_open())){
throw "Could not open file to read customer order. ";
}
newoutFile.open("outfile.txt",std::ios_base::out);
if(!(newoutFile.is_open())){
throw "Could not open file to write customer order. ";
}
newoutFile.seekp(0,std::ios::beg);
std::string line;
int skiplinesCount = 2;
if(beginOffset != 0){
//write file from zero to beginoffset and from endoffset to eof If to delete is non-zero
//or write file from zero to beginoffset if to delete is non-zero and last record
newinFile.seekg (0,std::ios::beg);
// if primarykey < largestkey , it's a middle record
customerOrder order;
long tempOffset(0);
int largestKey = order.largestKey(argv);
if(primaryKey < largestKey) {
//stops right before "current..." next record.
while(tempOffset < beginOffset){
std::getline(newinFile,line);
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
newinFile.seekg(endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while( std::getline(newinFile,line) ) {
newoutFile << line << std::endl;
}
} else if (primaryKey == largestKey){
//its the last record.
//write from zero to beginoffset.
while((tempOffset < beginOffset) && (std::getline(newinFile,line)) ) {
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
} else {
throw "Error in delete key"
}
} else {
//its the first record.
//write file from endoffset to eof
//works with endOffset - 4 (but why??)
newinFile.seekg (endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while(std::getline(newinFile,line)) {
newoutFile << line << std::endl;
}
}
newoutFile.close();
newinFile.close();
}
beginOffset is a specific point in the file (beginning of each record) , and endOffset is the end of the record, calculated in another function with tellg (findFoodOrder) I did not add this as it may become very lengthy, but you can find it on my site (under: menu version 1 link) :
http://www.buildincode.com