Problem in using seekg() to read file in C++ - c++

I am learning to read and write file in C++ and find a problem.
My test.txt file contains 3 string in 3 lines:
abc
def
mnp
My problem is: I don't understand why I need to use f.seekg(2, ios::cur);
instead of f.seekg(1, ios::cur);
I know how to use seekg() in c++ and I think that I just need to ignore 1 byte
to get the next line by the getline() function.
This is my code:
ifstream f;
f.open("D:\\test.txt", ios::in);
string str1, str2, str3;
f >> str1;
f.seekg(2, ios::cur);
getline(f, str2);
getline(f, str3);
cout << str1 << " " << str2 << " " << str3 << endl;

Reason for your trouble is explained for example here:
Why does std::getline() skip input after a formatted extraction?
However about your actual question about seekg. You open the file in text mode. This means that when you read the file, line feeds are given to your C++ code as single characters, '\n'. But on the disk they may be something else, and it seems you are running your code on Windows. There newline in a text file is typically two bytes, CR (ASCII code 13) and LF (ASCII code 10). Reading or writing in text mode will perform this conversion between a single character in your C++ string vs two bytes in the file for you.
seekg works on offsets and does not care about this, offsets are same whether you open the file in text or binary mode. If you use seekg to skip new line, your code becomes platform-dependent, on Windows you need to skip 2 bytes as explained above, while in other platforms such as Unix you need to skip just single byte.
So, do not use seekg for this purpose, see the linked question for better solutions.

Related

How to append to the last line of a file in c++?

using g++, I want to append some data to the last line (but to not create a new line) of a file. Probably, a good idea would be to move back the cursor to skip the '\n' character in the existing file. However this code does not work:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream myfile;
myfile.open ("file.dat", fstream::app|fstream::out);
myfile.seekp(-1,myfile.ios::end); //I believe, I am just before the last '\n' now
cout << myfile.tellp() << endl; //indicates the position set above correctly
myfile << "just added"; //places the text IN A NEW LINE :(
//myfile.write("just added",10); //also, does not work correctly
myfile.close();
return 0;
}
Please give me the idea of correcting the code. Thank you in advance. Marek.
When you open with app, writing always writes at the end, regardless of what tellp tells you.
("app" is for "append", which does not mean "write in an arbitrary location".)
You want ate (one of the more inscrutable names in C++) which seeks to the end only immediately after opening.
You also want to add that final newline, if you want to keep it.
And you probably also want to check that the last character is a newline before overwriting it.
And, seeking by characters can do strange things in text mode, and if you open in binary mode you need to worry about the platforms's newline convention.
Manipulating text is much harder than you think.
(And by the way, you don't need to specify out on an ofstream - the "o" in "ofstream" takes care of that.)

How Can I Detect That a Binary File Has Been Completely Consumed?

If I do this:
ofstream ouput("foo.txt");
output << 13;
output.close();
ifstream input("foo.txt");
int dummy;
input >> dummy;
cout << input.good() << endl;
I'll get the result: "0"
However if I do this:
ofstream ouput("foo.txt", ios_base::binary);
auto dummy = 13;
output.write(reinterpret_cast<const char*>(&dummy), sizeof(dummy));
output.close();
ifstream input("foo.txt", ios_base::binary);
input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy));
cout << input.good() << endl;
I'll get the result: "1"
This is frustrating to me. Do I have to resort to inspecting the ifstream's buffer to determine whether it has been entirely consumed?
Regarding
How Can I Detect That a Binary File Has Been Completely Consumed?
A slightly inefficient but easy to understand way is to measure the size of the file:
ifstream input("foo.txt", ios_base::binary);
input.seekg(0, ios_base::end); // go to end of the file
auto filesize = input.tellg(); // current position is the size of the file
input.seekg(0, ios_base::beg); // go back to the beginning of the file
Then check current position whenever you want:
if (input.tellg() == filesize)
cout << "The file was consumed";
else
cout << "Some stuff left in the file";
This way has some disadvantages:
Not efficient - goes back and forth in the file
Doesn't work with special files (e.g. pipes)
Doesn't work if the file is changed (e.g. you open your file in read-write mode)
Only works for binary files (seems your case, so OK), not text files
So better just use the regular way people do it, that is, try to read and bail if it fails:
if (input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)))
cout << "I have read the stuff, will work on it now";
else
cout << "No stuff in file";
Or (in a loop)
while (input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)))
{
cout << "Working on your stuff now...";
}
You are doing totally different things.
The operator>> is greedy and will read as much as possible into dummy. It so happens that while doing so, it runs into the end of file. That sets the input.eof(), and the stream is no longer good(). As it did find some digits before the end, the operation is still successful.
In the second read, you ask for a specific number of bytes (4 most likely) and the read is successful. So the stream is still good().
The stream interface doesn't predict the outcome of any future I/O, because in the general case it cannot know. If you use cin instead of input there might now be more to read, if the user continued typing.
Specifically, the eof() state doesn't appear until someone tries to read past end-of-file.
For text streams, as you have written only the integer value and not even a space not an end of line, at read time, the library must try to read one character passed the 1 and 3 and hits the end of file. So the good bit is false and the eof is true.
For binary streams, you have written 4 bytes (sizeof(int)) assuming ints are 32 bits large, and you read 4 bytes. Fine. No problem has still occured and the good bit is true and eof false. Only next read will hit the end of file.
But beware. In text example, if you open the text file in a editor and simply save it without changing anything, chances are that the editor automacally adds an end of line. In that case, the read will stop on the end of line and as for the binary case the good bit will be true and eof false. Same is you write with output << 13 << std::endl;
All that means that you must never assume that a read is not the last element of a file when good it true and eof is false, because the end of file may be hit only on next read even if nothing is returned then.
TL/DR: the only foolproof way to know that there is nothing left in a file is when you are no longer able to read something from it.
You do not need to resort to inspecting the buffer. You can determine if the whole file has been consumed: cout << (input.peek() != char_traits<char>::eof()) << endl This uses: peek, which:
Reads the next character from the input stream without extracting it
good in the case of the example is:
Returning false after the last extraction operation, which occurs because the int extraction operator has to read until it finds a character that is not a digit. In this case that's the EOF character, and when that character is read even as a delimiter the stream's eofbit is set, causing good to fail
Returning true after calling read, because read extracts exactly sizeof(int)-bytes so even if the EOF character is the next character it is not read, leaving the stream's eofbit unset and good passing
peek can be used after either of these and will correctly return char_traits<char>::eof() in both cases. Effectively this is inspecting the buffer for you, but with one vital distinction for binary files: If you were to inspect a binary file yourself you'd find that it may contain the EOF character. (On most systems that's defined as 0xFF, 4 of which are in the binary representation of -1.) If you are inspecting the buffer's next char you won't know whether that's actually the end of the file or not.
peek doesn't just return a char though, it returns an int_type. If peek returns 0x000000FF then you're looking at an EOF character, but not the end of file. If peek returns char_traits<char>::eof() (typically 0xFFFFFFFF) then you're looking at the end of the file.

How do I remove the character "" from the beginning of a text file in C++?

I'm trying to read a text file, and for each word, I will put them into a node of a binary search tree. However, the first character is always read as " + first word". For example, if my first word is "This", then the first word that is inserted into my node is "This". I've been searching the forum for a solution to fix it, there was one post asking the same problem in Java, but no one has addressed it in C++. Would anyone help me to fix it ? Thank you.
I came to the a simple solution. I opened the file in Notepad, and saved it as ANSI. After that, the file is reading and passing correctly into the binary search tree
That's UTF-8's BOM
You need to read the file as UTF-8. If you don't need Unicode and just use the first 127 ASCII code points then save the file as ASCII or UTF-8 without BOM
This is Byte Order Mark (BOM). It's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
In C++, you can use the following function to convert a UTF-8 BOM file to ANSI.
void change_encoding_from_UTF8BOM_to_ANSI(const char* filename)
{
ifstream infile;
string strLine="";
string strResult="";
infile.open(filename);
if (infile)
{
// the first 3 bytes (ef bb bf) is UTF-8 header flags
// all the others are single byte ASCII code.
// should delete these 3 when output
getline(infile, strLine);
strResult += strLine.substr(3)+"\n";
while(!infile.eof())
{
getline(infile, strLine);
strResult += strLine+"\n";
}
}
infile.close();
char* changeTemp=new char[strResult.length()];
strcpy(changeTemp, strResult.c_str());
char* changeResult = change_encoding_from_UTF8_to_ANSI(changeTemp);
strResult=changeResult;
ofstream outfile;
outfile.open(filename);
outfile.write(strResult.c_str(),strResult.length());
outfile.flush();
outfile.close();
}
in debug mode findout the symbol for the special character and then replace it
content.replaceAll("\uFEFF", "");

Having problems with 0x0A character in C++ even in binary mode. (interprets it as new file)

Hi this might seem a bit noobie, but here we go. Im developing a program that downloads leaderboards of a certain game from the internet and transforms it into a proper format to work with it (elaborate rankings, etc).
The files contains the names, ordered by rank, but between each name there are 7 random control codes (obivously unprintable). The txt file looks like this:
..C...hName1..)...&Name2......)Name3..é...þName4..Ü...†Name5..‘...QName6..~...bName7..H...NName8..|....Name9..v...HName10.
Checked via an hexEditor and saw the first control code after each name is always a null character (0x00). So, what I do is read everything, and then cout every character. When a 0x00 character is found, skip 7 characters and keep couting. Therefore you end up with the list, right?
At first I had the problem that on those random control codes, sometimes you would find like a "soft EOF" (0x1A), and the program would stop reading there. So I finally figured out to open it in binary mode. It worked, and then everything would be couted... or thats what I thought.
But I came across another file which still didn't work, and finally found out that there was an EOF character! (0x0A) Which doesn't makes sense since Im opening it in binary mode. But still, after reading that character, C++ interprets that as a new file, and hence skips 7 characters, so the name after that character will always appear cut.
Here's my current code:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main () {
string scores;
system("wget http://certainwebsite/001.txt"); //download file
ifstream highin ("001.txt", ios::binary);
ofstream highout ("board.txt", ios::binary);
if (highin.is_open())
{
while ( highin.good() )
{
getline (highin, scores);
for (int i=0;i<scores.length(); i++)
{
if (scores[i]==0x00){
i=i+7; //skip 7 characters if 'null' is found
cout << endl;
highout << endl;
}
cout << scores[i];
highout << scores[i]; //cout names and save them in output file
}
}
highin.close();
}
else cout << "Unable to open file";
system("pause>nul");
}
Not sure how to ignore that character if being already in binary mode doesn't work. Sorry for the long question but I wanted to be detailed and specific. In this case, the EOF character is located before the Name3, and hence this is how the output looks like:
http://i.imgur.com/yu1NjoZ.png
By default getline() reads until the end of line and discards the newline character. However, the delimiter character could be customized (by supplying the third parameter). If you wish to read until the null character (not until the end of line), you could try using getline (highin, scores, '\0'); (and adjusting the logic of skipping the characters).
I'm glad you figured it out and it doesn't surprise me that getline() was the culprit. I had a similar issue dealing with the newline character when I was trying to read in a CSV file. There are several different getline() functions in C++ depending on how you call the function and each seems to handle the newline character differently.
As a side note, in your for loop, I'd recommend against performing a method call in your test. That adds unnecessary overhead to the loop. It'd be better to call the method once and put that value into a variable, then enter the loop and test i against the length variable. Unless you expect the length to change, calling the length() method each iteration is a waste of system resources.
Thank you all guys, it worked, it was the getline() which was giving me problems indeed. Due to the 'while' loop, each time it found a new line character, it restarted the process, hence skipping those 7 characters.

How to change EOL coding in txt files?

I am writing in Visual Studio 2008 in C++ and I have problems with other libraries - they do not accept the line endings (EOL) I generate with my txt files.
How can I change that while writing a file with
std::ofstream myFile;
myFile.open("traindata.txt");
myFile << "stuff" << endl;
// or
//myFile << "stuff" << '\n';
myFile.close();
EDIT 2 :
Ok, I did a mistake in code : I was appending "0 " for every iteration so that I had whitespace before the EOL.
By bad. You guys have been right. Thanks for help.
Is it possible that you just don't want the \n to end of line sequence to happen? Open your file using std::ios_base::binary: this turns off precisely the conversion. ... and don't use std::endl unless you really want to flush the stream:
std::ofstream myFile("traindata.txt", std::ios_base::binary);
myFile << "stuff\n";
The close() is typically also unnecessary unless you want to check that it was successful.
You should open the file stream in binary mode to keep C++ library from converting line endings automatically:
myFile.open("traindata.txt", std::ios_base::out|std::ios_base::binary);
That will keep C++ library from converting '\n' to OS-specific EOL symbol (CR-LF on Windows).
Download notepad++ - that should be able to fix the problem.
Or dos2unix, unix2dos on cygwin
Use myfile << "\r\n" for the Windows-style ending, or myfile << "\n" for the UNIX-style.