Using seekg() in text mode - c++

While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.
Eventually I did a sanity check; even if I just do this...
int main()
{
std::ifstream dataFile("myfile.txt",
std::ifstream::in);
if (dataFile.is_open() && !dataFile.fail())
{
while (dataFile.good())
{
std::string line;
dataFile.seekg(dataFile.tellg());
std::getline(dataFile, line);
}
}
}
...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?

This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.
The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte \r\n endline is converted to \n, windows also seems to append a spurious newline to the end of the file).
lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).
The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
For a file with a single a character I get the following output
2 3
Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.
Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f;
f.rdbuf()->pubsetbuf(nullptr, 0);
f.open("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
but this is far from ideal.
Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.

Thanks for this , though it's a very old post. I was stuck on this problem for more then a week. Here's some code examples on my site (the menu versions 1 and 2). Version 1 uses the solution presented here, in case anyone wants to see it .
:)
void customerOrder::deleteOrder(char* argv[]){
std::fstream newinFile,newoutFile;
newinFile.rdbuf()->pubsetbuf(nullptr, 0);
newinFile.open(argv[1],std::ios_base::in);
if(!(newinFile.is_open())){
throw "Could not open file to read customer order. ";
}
newoutFile.open("outfile.txt",std::ios_base::out);
if(!(newoutFile.is_open())){
throw "Could not open file to write customer order. ";
}
newoutFile.seekp(0,std::ios::beg);
std::string line;
int skiplinesCount = 2;
if(beginOffset != 0){
//write file from zero to beginoffset and from endoffset to eof If to delete is non-zero
//or write file from zero to beginoffset if to delete is non-zero and last record
newinFile.seekg (0,std::ios::beg);
// if primarykey < largestkey , it's a middle record
customerOrder order;
long tempOffset(0);
int largestKey = order.largestKey(argv);
if(primaryKey < largestKey) {
//stops right before "current..." next record.
while(tempOffset < beginOffset){
std::getline(newinFile,line);
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
newinFile.seekg(endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while( std::getline(newinFile,line) ) {
newoutFile << line << std::endl;
}
} else if (primaryKey == largestKey){
//its the last record.
//write from zero to beginoffset.
while((tempOffset < beginOffset) && (std::getline(newinFile,line)) ) {
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
} else {
throw "Error in delete key"
}
} else {
//its the first record.
//write file from endoffset to eof
//works with endOffset - 4 (but why??)
newinFile.seekg (endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while(std::getline(newinFile,line)) {
newoutFile << line << std::endl;
}
}
newoutFile.close();
newinFile.close();
}
beginOffset is a specific point in the file (beginning of each record) , and endOffset is the end of the record, calculated in another function with tellg (findFoodOrder) I did not add this as it may become very lengthy, but you can find it on my site (under: menu version 1 link) :
http://www.buildincode.com

Related

Why is C++'s file I/O ignoring initial empty lines when reading a text file? How can I make it NOT do this?

I'm trying to build myself a mini programming language using my own custom regular expression and abstract syntax tree parsing library 'srl.h' (aka. "String and Regular-Expression Library") and I've found myself an issue I can't quite seem to figure out.
The problem is this: When my custom code encounters an error, it obviously throws an error message, and this error message contains information about the error, one bit being the line number from which the error was thrown.
The issue comes in the fact that C++ seems to just be flat out ignoring the existence of lines which contain no characters (ie. line that are just the CRLF) until it finds a line which does contain characters, after which point it stops ignoring empty lines and treats them properly, thus giving all errors thrown an incorrect line number, with them all being incorrect by the same offset.
Basically, if given a file which contains the contents "(crlf)(crlf)abc(crlf)def", it'll be read as though its content were "abc(crlf)def", ignoring the initial new lines and thus reporting the wrong line number for any and all errors thrown.
Here's a copy of the (vary messily coded) function I'm using to get the text of a text file. If one of y'all could tell me what's going on here, that'd be awesome.
template<class charT> inline std::pair<bool, std::basic_string<charT>> load_text_file(const std::wstring& file_path, const char delimiter = '\n') {
std::ifstream fs(file_path);
std::string _nl = srl::get_nlp_string<char>(srl::newline_policy);
if (fs.is_open()) {
std::string s;
char b[SRL_TEXT_FILE_MAX_CHARS_PER_LINE];
while (!fs.eof()) {
if (s.length() > 0)
s += _nl;
fs.getline(b, SRL_TEXT_FILE_MAX_CHARS_PER_LINE, delimiter);
s += std::string(b);
}
fs.close();
return std::pair<bool, std::basic_string<charT>>(true, srl::string_cast<char, charT>(s));
}
else
return std::pair<bool, std::basic_string<charT>>(false, std::basic_string<charT>());
}
std::ifstream::getline() does not input the delimiter (in this case, '\n') into the string and also flushes it from the stream, which is why all the newlines from the file (including the leading ones) are discarded upon reading.
The reason it seems the program does not ignore newlines between other lines is because of:
if (s.length() > 0)
s += _nl;
All the newlines are really coming from here, but this cannot happen at the very beginning, since the string is empty.
This can be verified with a small test program:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::ifstream inFile{ "test.txt" }; //(crlf)(crlf)(abc)(crlf)(def) inside
char line[80]{};
int lineCount{ 0 };
std::string script;
while (inFile.peek() != EOF) {
inFile.getline(line, 80, '\n');
lineCount++;
script += line;
}
std::cout << "***Captured via getline()***" << std::endl;
std::cout << script << std::endl; //prints "abcdef"
std::cout << "***End***" << std::endl << std::endl;
std::cout << "Number of lines: " << lineCount; //result: 5, so leading /n processed
}
If the if condition is removed, so the program has just:
s += _nl;
, newlines will be inserted instead of the discarded ones from the file, but as long as '\n' is the delimiter, std::ifstream::getline() will continue discarding the original ones.
As a final touch, I would suggest using
while (fs.peek() != EOF){};
instead of
while(fs){}; or while(!fs.eof()){};
If you look at int lineCount's final value in the test program, the latter two give 6 instead of 5, as they make a redundant iteration in the end.

bool function doesn't work as a while condition

I am trying to to check if a file has successfully opened, read from it and output what I've read from it all in one function, because I have 7 files to operate on in the same code and I want to avoid writing the same code over and over again.
So I have made a bool function and put it as a while condition.
If I succeed, the function returns true and if I don't it returns false. So a while(!function) should keep trying until it works, correct ? And the answer is yes, it works as intended.
But if I change the condition of the while to while(function) one would expect to repeat the function until it fails somehow (maybe it can't open the file.). But it doesn't behave as expected. It only works correctly on the first while iteration.
This is the example:
#include <iostream>
#include <unistd.h>
#include <string.h>
#include <fstream>
#include <sstream>
bool readConfig(std::fstream& file, std::string (&str)[10], std::string identity) {
if(file.is_open()) {
if(file.seekg(0)) {
std::cout<<"Start from 0"<<std::endl;
}
// Get content line by line of txt file
int i = 0;
while(getline(file, str[i++]));
std::cout<<"i= "<<i<<std::endl;
for(int k = 0; k<i; k++) {
std::cout<<identity<<" = "<<str[k]<<std::endl;
}
return true;
} else {
std::cout<<"ERROR ! Could not open file."<<std::endl;
return false;
}
}
int main() {
char configFilePath[]="test.txt";
std::fstream configFile;
configFile.open(configFilePath, std::fstream::in);
std::string test[10];
std::string id = "testing";
while(!readConfig(configFile, test,id)) {
usleep(1000*1000);
};
return 0;
}
This is the content of test.txt :
line 1
line 2
line 3
line 4
This is the output:
Start from 0
i= 5
testing = line 1
testing = line 2
testing = line 3
testing = line 4
testing =
i= 1
testing = line 1
i= 1
testing = line 1
and so on.
Why does it work on the first iteration but then it stops at i=1 ? I am asking because I don't know if what I did is correct or not. while(!function) works, but maybe it won't work all the time, maybe my code is flawed.
Or maybe while(getline(configFile, string[i++])); is at fault here ?
This is the code I am trying to replace:
void readConfig(std::fstream& configFile, std::string (&str)[10], std::string identity) {
if(configFile) {
// Get content line by line of txt file
int i = 0;
while(getline(configFile, str[i++]));
//for debug only
if((i-1) == 0) {
std::cout<<identity<<" = "<<str[i-1]<<std::endl;
} else {
for(int k = 0; k<i-1; k++) {
std::cout<<identity<<" = "<<str[k]<<std::endl;
}
}
} else {
log("ERROR ! Could not get content from file.");
}
}
int main() {
file.open(file, std::fstream::in);
if(file.is_open()) {
std::cout<<"Successfully opened URL Display Text file."<<std::endl;
std::string inputs[10];
std::string id = "url_text";
readConfig(file, inputs, id);
file.close();
} else {
// Could not open file
log("Error ! Could not open file.");
}
}
I do this 7 times, instead of just calling a function 7 times, that does all of that.
But if I change the condition of the while to while(function) one would expect to repeat the function until it fails somehow (maybe it can't open the file.).
You reasoning is off here. The function is not opening the file, so that is nothing that can go wrong on the next iteration when it suceeded on the first.
What the function does is: it reads all the contets of the file, then returns true. And subsequent iterations there is nothing left to read, but still the function returns true.
You should check if the file is open only once, not in each iteration. If the function is supposed to read a single line then make it so, currently it reads all.
Change the test from if (file.is_open()) to if (file). Failing to open the file is not the only way that a file stream can end up in a bad state. In particular, on the second call to this function, the stream is open, but it's in a failed state because the last read attempt failed.
If you just want to read the file line by line, print the lines and store them, I'd do it like this.
Rather than using a c-style array use a std::vector or std::array
Check the if the file is open before you call the read function
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
void readConfig(std::ifstream& configFile,
std::vector<std::string>& lines,
const unsigned int limit) {
std::string line;
while (std::getline(configFile, line)) {
if (lines.size() >= limit) {
break;
}
lines.push_back(line);
}
}
int main() {
const std::array<std::string, 3> fileNames = {"test1.txt",
"test2.txt",
"test3.txt"};
// Iterate over all your files
for (const auto& fileName : fileNames) {
// Open the file
std::ifstream configFile(fileName);
if (!configFile.is_open()) {
std::cout << "ERROR! Could not open file.\n";
continue;
}
// Read the file
std::vector<std::string> lines;
constexpr unsigned int limit = 4;
readConfig(configFile, lines, limit);
if (configFile.is_open()) {
configFile.close();
}
// Work with the file content
std::cout << fileName << "\n";
for (const auto& line : lines) {
std::cout << "testing = " << line << "\n";
}
std::cout << "\n";
}
return 0;
}
Your output paints a fairly clear picture of what is going on. You have enough debugging output to identify what choices have been made. They key point I would focus on is the following sequence:
testing =
i= 1
The first of these lines is the fifth line read from your four-line file. Not surprisingly, there is nothing there. The next output line comes from the next invocation of readConfig, somewhere in the branch where file.isopen() is true. However, note that there is not a line saying "Start from 0" between these two. That means file converted to false after the call to file.seekg(0) (the value returned by that function is file, not directly a boolean). This indicates some sort of error state, and one should expect that error state to persist until cleared. And there is no attempt made to clear it.
The next bit of code is the getline loop. As with seekg, the getLine function returns the stream (file) rather than a boolean. As expected, the error state has persisted, making the loop condition false, hence no iterations of the loop.
testing = line 1
The next line of output is ambiguous. It could indicate that the position was successfully changed to the start of the file, and that the first line of input was successfully read. Or it could indicate that the call to getLine returned before erasing the provided string, leaving the contents from the first call to readConfig. I'm thinking the latter, but you could check for yourself by manually erasing str[0] before the getline loop.
In general, reusing resources like this makes debugging harder because the results could be misleading. Debugging would be less confusing if str was a local variable instead of a parameter. Similar for file – instead of a stream parameter, you could pass a string with the name of the file to open.

Logic for reading rows and columns from a text file (textparser) C++

I'm really stuck with this problem I'm having for reading rows and columns from a text file. We're using text files that our prof gave us. I have the functionality running so when the user in puts "numrows (file)" the number of rows in that file prints out.
However, every time I enter the text files, it's giving me 19 for both. The first text file only has 4 rows and the other one has 7. I know my logic is wrong, but I have no idea how to fix it.
Here's what I have for the numrows function:
int numrows(string line) {
ifstream ifs;
int i;
int row = 0;
int array [10] = {0};
while (ifs.good()) {
while (getline(ifs, line)) {
istringstream stream(line);
row = 0;
while(stream >>i) {
array[row] = i;
row++;
}
}
}
}
and here's the numcols:
int numcols(string line) {
int col = 0;
int i;
int arrayA[10] = {0};
ifstream ifs;
while (ifs.good()) {
istringstream streamA(line);
col = 0;
while (streamA >>i){
arrayA[col] = i;
col++;
}
}
}
edit: #chris yes, I wasn't sure what value to return as well. Here's my main:
int main() {
string fname, line;
ifstream ifs;
cout << "---- Enter a file name : ";
while (getline(cin, fname)) { // Ctrl-Z/D to quit!
// tries to open the file whose name is in string fname
ifs.open(fname.c_str());
if(fname.substr(0,8)=="numrows ") {
line.clear();
for (int i = 8; i<fname.length(); i++) {
line = line+fname[i];
}
cout << numrows (line) << endl;
ifs.close();
}
}
return 0;
}
This problem can be more easily solved by opening the text file as an ifstream, and then using std::get to process your input.
You can try for comparison against '\n' as the end of line character, and implement a pair of counters, one for columns on a line, the other for lines.
If you have variable length columns, you might want to store the values of (numColumns in a line) in a std::vector<int>, using myVector.push_back(numColumns) or similar.
Both links are to the cplusplus.com/reference section, which can provide a large amount of information about C++ and the STL.
Edited-in overview of possible workflow
You want one program, which will take a filename, and an 'operation', in this case "numrows" or "numcols". As such, your first steps are to find out the filename, and operation.
Your current implementation of this (in your question, after editing) won't work. Using cin should however be fine. Place this earlier in your main(), before opening a file.
Use substr like you have, or alternatively, search for a space character. Assume that the input after this is your filename, and the input in the first section is your operation. Store these values.
After this, try to open your file. If the file opens successfully, continue. If it won't open, then complain to the user for a bad input, and go back to the beginning, and ask again.
Once you have your file successfully open, check which type of calculation you want to run. Counting a number of rows is fairly easy - you can go through the file one character at a time, and count the number that are equal to '\n', the line-end character. Some files might use carriage-returns, line-feeds, etc - these have different characters, but are both a) unlikely to be what you have and b) easily looked up!
A number of columns is more complicated, because your rows might not all have the same number of columns. If your input is 1 25 21 abs 3k, do you want the value to be 5? If so, you can count the number of space characters on the line and add one. If instead, you want a value of 14 (each character and each space), then just count the characters based on the number of times you call get() before reaching a '\n' character. The use of a vector as explained below to store these values might be of interest.
Having calculated these two values (or value and set of values), you can output based on the value of your 'operation' variable. For example,
if (storedOperationName == "numcols") {
cout<< "The number of values in each column is " << numColsVal << endl;
}
If you have a vector of column values, you could output all of them, using
for (int pos = 0; pos < numColsVal.size(); pos++) {
cout<< numColsVal[pos] << " ";
}
Following all of this, you can return a value from your main() of 0, or you can just end the program (C++ now considers no return value from main to a be a return of 0), or you can ask for another filename, and repeat until some other method is used to end the program.
Further details
std::get() with no arguments will return the next character of an ifstream, using the example code format
std::ifstream myFileStream;
myFileStream.open("myFileName.txt");
nextCharacter = myFileStream.get(); // You should, before this, implement a loop.
// A possible loop condition might include something like `while myFileStream.good()`
// See the linked page on std::get()
if (nextCharacter == '\n')
{ // You have a line break here }
You could use this type of structure, along with a pair of counters as described earlier, to count the number of characters on a line, and the number of lines before the EOF (end of file).
If you want to store the number of characters on a line, for each line, you could use
std::vector<int> charPerLine;
int numberOfCharactersOnThisLine = 0;
while (...)
{
numberOfCharactersOnThisLine = 0
// Other parts of the loop here, including a numberOfCharactersOnThisLine++; statement
if (endOfLineCondition)
{
charPerLine.push_back(numberOfCharactersOnThisLine); // This stores the value in the vector
}
}
You should #include <vector> and either specific std:: before, or use a using namespace std; statement near the top. People will advise against using namespaces like this, but it can be convenient (which is also a good reason to avoid it, sort of!)

Error copying and pasting data from a file to another

I am writing a code to merge multiple text files and output a single file.
There can be up to 22 input text files which contain 1400 lines each.
Each line has 8 bits of binary and the new line characters \n.
I am out putting a single file that has all 22 text files merged.
Problem is with my output file, after 1400 lines it appears that the content from the previous file is still being placed into output file(although the length of the previous file was 1400 lines). This extra content also begins to have additional line space between each row if opened by microsoft office or sublime, however it is interpreted as a single line if opened by notepad or excel(a single cell in excel).
Following is the picture of expected behaviour of the output file,
Here is a picture of abnormal behaviour. This starts when the first file finishes.
I know this data is from the first file still because the second file starts from 00000000
And here is the start of the second file,
And this abnormal behavior repeats every single time the files are switching.
My implementation to achieve this is as follows:
repeat:
if(user_input == 'y')
{
fstream data_out ("data.txt",fstream::out);
for(int i = 0; i<files_found; i++)
{
fstream data_in ((file_names[i].c_str()),fstream::in);
if(data_in.is_open())
{
data_in.seekg(0,data_in.end);
long size = data_in.tellg();
data_in.seekg(0,data_in.beg);
char * buffer = new char[size];
cout << size;
data_in.read(buffer,size);
data_out.write(buffer,size);
delete[] buffer;
}else
{
cout << "Unexpected error";
return 1;
}
data_in.close();
}
data_out.close();
}else if(user_input == 'n')
{
return 1;
}else
{
cout << "Input not recognised. Type y for Yes, and n for No";
cin >> user_input;
goto repeat;
}
Further information:
I have checked the size variable and it is as I expect, 14000.
8 bits, and a \ with n = 10 characters per line,
1400 rows x 10 = 14000.
Assuming reader of code to be experienced.
Sorry to bump this question, but I really like question that are marked as answered. JoachimPileborg answer seems to have worked for you:
Also, instead of seeking and checking sizes and allocating memory, why
not just do e.g. data_out << data_in.rdbuf();? This will copy the
whole input file to the output. – Joachim Pileborg Jul 29 at 17:26
A reference http://www.cplusplus.com/reference/ios/ios/rdbuf/ and an example:
#include <fstream>
#include <string>
#include <vector>
int main(int argc, char** argv)
{
typedef std::vector<std::string> Filenames;
Filenames vecFilenames;
// Populate the list of file names
vecFilenames.push_back("Text1.txt");
vecFilenames.push_back("Text2.txt");
vecFilenames.push_back("Text3.txt");
// Merge the files into Output.txt
std::ofstream fpOutput("Output.txt");
for (Filenames::iterator it = vecFilenames.begin();
it != vecFilenames.end(); ++it)
{
std::ifstream fpInput(it->c_str());
fpOutput << fpInput.rdbuf();
fpInput.close();
}
fpOutput.close();
return 0;
}

getline seems to not working correctly

Please tell me what am I doing wrong here. What I want to do is this:
1.Having txt file with four numbers and each of this numbers has 15 digits:
std::ifstream file("numbers.txt",std::ios::binary);
I'm trying to read those numbers into my array:
char num[4][15];
And what I'm thinking I'm doing is: for as long as you don't reach end of files write every line (max 15 chars, ending at '\n') into num[lines]. But this somewhat doesn't work. Firstly it reads correctly only first number, rest is just "" (empty string) and secondly file.eof() doesn't seems to work correctly either. In txt file which I'm presenting below this code I reached lines equal 156. What's going on?
for (unsigned lines = 0; !file.eof(); ++lines)
{
file.getline(num[lines],15,'\n');
}
So the whole "routine" looks like this:
int main()
{
std::ifstream file("numbers.txt",std::ios::binary);
char numbers[4][15];
for (unsigned lines = 0; !file.eof(); ++lines)
{
file.getline(numbers[lines],15,'\n');// sizeof(numbers[0])
}
}
This is contents of my txt file:
111111111111111
222222222222222
333333333333333
444444444444444
P.S.
I'm using VS2010 sp1
Do not use the eof() function! The canonical way to read lines is:
while( getline( cin, line ) ) {
// do something with line
}
file.getline() extracts 14 characters, filling in num[0][0] .. num[0][13]. Then it stores a '\0' in num[0][14] and sets the failbit on file because that's what it does when the buffer is full but terminating character not reached.
Further attempts to call file.getline() do nothing because failbit is set.
Tests for !file.eof() return true because the eofbit is not set.
Edit: to give a working example, best is to use strings, of course, but to fill in your char array, you could do this:
#include <iostream>
#include <fstream>
int main()
{
std::ifstream file("numbers.txt"); // not binary!
char numbers[4][16]={}; // 16 to fit 15 chars and the '\0'
for (unsigned lines = 0;
lines < 4 && file.getline(numbers[lines], 16);
++lines)
{
std::cout << "numbers[" << lines << "] = " << numbers[lines] << '\n';
}
}
tested on Visual Studio 2010 SP1
According to ifstream doc, reading stops either after n-1 characters are read or delim sign is found : first read would take then only 14 bytes.
It reads bytes : '1' (the character) is 0x41 : your buffer would be filled with 0x41 instead of 1 as you seem to expect, last character will be 0 (end of c-string)
Side note, your code doesn't check that lines doesn't go beyond your array.
Using getline supposes you're expecting text and you open the file in binary mode : seems wrong to me.
It looks like the '\n' in the end of the first like is not being considered, and remaining in the buffer. So in the next getline() it gets read.
Try adding a file.get() after each getline().
If one file.get() does not work, try two, because under the Windows default file encoding the line ends with '\n\r\' (or '\r\n', I never know :)
Change it to the following:
#include <cstring>
int main()
{
//no need to use std::ios_base::binary since it's ASCII data
std::ifstream file("numbers.txt");
//allocate one more position in array for the NULL terminator
char numbers[4][16];
//you only have 4 lines, so don't use EOF since that will cause an extra read
//which will then cause and extra loop, causing undefined behavior
for (unsigned lines = 0; lines < 4; ++lines)
{
//copy into your buffer that also includes space for a terminating null
//placing in if-statement checks for the failbit of ifstream
if (!file.getline(numbers[lines], 16,'\n'))
{
//make sure to place a terminating NULL in empty string
//since the read failed
numbers[lines][0] = '\0';
}
}
}