Read a file line by line in C++ - c++

I wrote the following C++ program to read a text file line by line and print out the content of the file line by line. I entered the name of the text file as the only command line argument into the command line.
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
char buf[255] = {};
if (argc != 2)
{
cout << "Invalid number of files." << endl;
return 1;
}
ifstream f(argv[1], ios::in | ios::binary);
if (!f)
{
cout << "Error: Cannot open file." << endl;
return 1;
}
while (!f.eof())
{
f.get(buf,255);
cout << buf << endl;
}
f.close();
return 0;
}
However, when I ran this code in Visual Studio, the Debug Console was completely blank. What's wrong with my code?

Apart from the errors mentioned in the comments, the program has a logical error because istream& istream::get(char* s, streamsize n) does not do what you (or I, until I debugged it) thought it does. Yes, it reads to the next newline; but it leaves the newline in the input!
The next time you call get(), it will see the newline immediately and return with an empty line in the buffer, for ever and ever.
The best way to fix this is to use the appropriate function, namely istream::getline() which extracts, but does not store the newline.
The EOF issue
is worth mentioning. The canonical way to read lines (if you want to write to a character buffer) is
while (f.getline(buf, bufSz))
{
cout << buf << "\n";
}
getline() returns a reference to the stream which in turn has a conversion function to bool, which makes it usable in a boolean expression like this. The conversion is true if input could be obtained. Interestingly, it may have encountered the end of file, and f.eof() would be true; but that alone does not make the stream convert to false. As long as it could extract at least one character it will convert to true, indicating that the last input operation made input available, and the loop will work as expected.
The next read after encountering EOF would then fail because no data could be extracted: After all, the read position is still at EOF. That is considered a read failure. The condition is wrong and the loop is exited, which was exactly the intent.
The buffer size issue
is worth mentioning, as well. The standard draft says in 30.7.4.3:
Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) for the next available input character c
(in which case the input character
is extracted but not stored);
n is less than one or n - 1 characters are stored
(in which case the function calls setstate(
failbit)).
The conditions are tested in that order, which means that if n-1 characters have been stored and the next character is a newline (the default delimiter), the input was successful (and the newline is extracted as well).
This means that if your file contains a single line 123 you can read that successfully with f.getline(buf, 4), but not a line 1234 (both may or may not be followed by a newline).
The line ending issue
Another complication here is that on Windows a file created with a typical editor will have a hidden carriage return before the newline, i.e. a line actually looks like "123\r\n" ("\r" and "\n" each being a single character with the values 13 and 10, respectively). Because you opened the file with the binary flag the program will see the carriage return; all lines will contain that "invisible" character, and the number of visible characters fitting in the buffer will be one shorter than one would assume.
The console issue ;-)
Oh, and your Console was not entirely empty; it's just that modern computers are too fast and the first line which was probably printed (it was in my case) scrolled away faster than anybody could switch windows. When I looked closely there was a cursor in the bottom left corner where the program was busy printing line after line of nothing ;-).
The conclusion
Debug your programs. It's very easy with VS.
Use getline(istream, string).
Use the return value of input functions (typically the stream)
as a boolean in a while loop: "As long as you can extract any input, use that input."
Beware of line ending issues.
Consider C I/O (printf, scanf) for anything non-trivial (I didn't discuss this in my answer but I think that's what many people do).

Related

EOF - scanf and printf

I'm tring to do a simple exercise here, but i need to understand how EOF works first.
void main()
{
char s1[1000];
while (scanf("%s", s1)!=EOF)
;
printf("%s",s1);
}
The idea is to have multiple lines in input, and display them.
The problem I have is that if I put
Hello World
This is stackoverflow
When printf is called, it only prints
stackoverflow
Why isn't it printing everything and how do I make it print?
Regards
Remove the semicolon ;:
while (scanf("%s", s1)!=EOF)
printf("%s",s1);
Note that this will still exhibit odd behavior at end of file depending on how it ends exactly. Furthermore, it splits the input into words, which are separated by spaces or new lines. You may want to simply split into lines.
So you may be better served with for instance:
while (gets(s1)!=NULL)
puts(s1);
This code fragments reads your input line by line until end-of-file.
To read everything (or as much as your buffer can hold), you can use:
char s1[1000] = "";
fread(s1, sizeof(s1) - 1, 1, stdin);
puts(s1);
However, my preferred method of reading a text file is:
using namespace std;
string line;
while (getline(cin, line))
{
cout << line << endl;
}
That is because usually I want to process a file line by line, and getline with a string ensures the line buffer is always big enough.
You probably want this:
char s1[1000][20];
int i = 0 ;
while (!feof(stdin))
fgets(s1[i++], 20, stdin) ;
int j ;
for (j = 0; j < i; j++)
printf("%s\n", s1[j]);
Here you can enter at most 1000 lines that are maximum 19 characters long.
What you have is a loop that reads words into a buffer until it reaches EOF (and does nothing with those words), followed by a printf to print the contents of the buffer. The printf is after the loop (not in it), so executes once after the loop completes. At that time, the buffer will contain the last word read, so that is what gets printed.
The EOF return test means "nothing more to be read", which isn't necessarily an end of file (might be an error condition of some kind), but in practice that distinction can be ignored. Looping until your reading function returns EOF or NULL (depends on function) is good practice.
If you want to print each word as it is read, you need to put a printf in the loop.
If you want to store the words for later processing, you need to store them somewhere. That means declaring some storage space, or allocating space on the heap, and some bookkeeping to track how much space you've used/allocated.
If you want lines rather than words, you should use fgets instead of scanf("%s". Note that fgets returns NULL rather than EOF when there's nothing more to be read.
Because it only prints the last thing that is read from the file ("stackoverflow"). This is caused by the semicolon after the end of your while(...); - this means that you are doing while(...) { /* do nothing */} - which is probably not what you wanted
Also, printf("%s",s1)!='\0'; makes no sense at all. For one thing, printf returns the number of characters printed - '\0' is the value zero written as a character constant. And of course, doing != 0 of the result without some sort of use of the comparison is pretty much pointless too.
Use fgets instead of scanf if you want to read one line at at time. scanf will stop reading when it finds a whitespace. fgets will read till the end of the line.
Use fgets(). Simple and sweet
char buf[1000];
while (fgets(buf, sizeof buf, stdin) != NULL) {
fputs(buf, stdout);
}
Here is how end-of-file works in C. The input channels are called input streams; disk files and stdin are both input streams. The "end-of-file" state is a flag that a stream has, and that flag is triggered when you try to read from a stream, but it turns out there are no more characters in the stream, and there never will be any more. (If the stream is still active but just waiting for user input for example, it is not considered to be end-of-file; read operations will block).
Streams can have other error states, so looping until "end-of-file" is set is usually wrong. If the stream does go into an error state then your loop will never exit (aka. "infinite loop").
The end-of-file state can be checked by feof. However, some input operations also can signal an error as well as, or instead of, returning the actual data they were intended to read. These functions can return the value EOF. Usually these functions return EOF in both cases: end-of-file, and stream error. This is different to feof which only returns true in the case of end-of-file.
For example, getchar() and scanf will return EOF if it was end-of-file, but also if the stream is in an error state.
So it is OK to use getchar()'s result as a loop condition, but not feof on its own.
Also, it is sometimes not OK to use scanf() != EOF as a loop condition. It's possible that there is no stream error, but just that the data you requested wasn't there. For example, if you scan for "%d" but there are letters in the stream. Instead, it's better to check for successful conversion (scanf returns the number of successful conversions it performed). Then when you exit your loop, you can go on to call feof and ferror to see whether it was due to end-of-file, or error, or just unexpected input.

Why is the c++ input file stream checked twice here?

Here is a snippet from a c++ tutorial:
// istream::get example
#include <iostream> // std::cin, std::cout
#include <fstream> // std::ifstream
int main () {
char str[256];
std::cout << "Enter the name of an existing text file: ";
std::cin.get (str,256); // get c-string
std::ifstream is(str); // open file
while (is.good()) // loop while extraction from file is possible
{
char c = is.get(); // get character from file
if (is.good())
std::cout << c;
}
is.close(); // close file
return 0;
}
Notice is.good() appeared twice, first with while, then with if.
Link to the example: http://www.cplusplus.com/reference/istream/istream/get/
Why is the c++ input file stream checked twice here?
The fact of the matter is that it is unnecessarily checked twice. If the second inner if (is.good()) passes, then the outer while (is.good()) will always pass as well. The author of the code needed some way of looping, and he incorrectly assumed that a while (is.good()) is an appropriate condition because it will stop the loop when the stream fails to extract. But this is only half-true. while (is.good()) is never the correct way to perform the extraction.
You have to perform the input first and then check if it succeeded. Otherwise it is possible to perform a failed extraction, use the result of that extraction and receive unwanted behavior from your program. The correct way to do it is by using the extraction itself as the condition. The input operator will return a reference to the stream, and then it will turn into a boolean returning true if the previous read suceeded, or false otherwise:
while (is.get(c))
{
std::cout << c;
}
The variable c is also not outside of the loop. You can enclose the while() loop in a block or use a for() loop instead:
for (char c; is.get(c); )
{
std::cout << c;
}
But it seems that this code is attempting to write all the content from the file to standard output. Reading a character one-by-one is the way shown here, but you can also use stream iterators or the buffer overload of std::ostream::operator<<() as well.
There are two more problems I see in this code. Namely:
std::string is the preferred construct for manipulating dynamically-sized strings, not C-style strings which require the use of archaic input methods such as .get(), .getline(), etc, and their respective overloads.
Manually closing a file is usually unneeded. The stream will close itself at the end of the scope in which it was created. You probably only want to close the file yourself to check if it succeeds or to reopen the stream with a different file or openmode.
The first one, that in while (is.good()), checks if it has reached EOF (End Of File). If not, it doesn't enter the while loop. Once entered in while(), it means that it have at least one character remained for the instruction char c = is.get();.
What the second if() does is that it doesn't allow to print the last character read, because after a char c = is.get();, the file may reach EOF. In case it does, the character is not printed.
For example, let's say you have this file:
"Just an example!"
Now, if you had just:
while (is.good()) // loop while extraction from file is possible
{
char c = is.get(); // get character from file
std::cout << c;
}
the output would be: "Just an example! ". The last space is the EOF character (which is the last character read).
But with:
while (is.good()) // loop while extraction from file is possible
{
char c = is.get(); // get character from file
if (is.good())
std::cout << c;
}
the output would be: "Just an example!", which is what you would expect it to be.

Error reading and printing a text file with C++

I have a bug with my code (the code at the end of the question). The purpose of my C++ executable is to read a file that contains numbers, copy it in a std::vector and
then just print the contents in the stdout? Where is the problem? (atoi?)
I have a simple text file that contains the following numbers (each line has one number)
mini01:algorithms ios$ cat numbers.txt
1
2
3
4
5
When I execute the program I receive one more line:
mini01:algorithms ios$ ./a.out
1
2
3
4
5
0
Why I get the 6th line in the stdout?
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
using namespace std;
void algorithm(std::vector<int>& v) {
for(int i=0; i < v.size(); i++) {
cout << v[i] << endl;
}
}
int main(int argc, char **argv) {
string line;
std::vector<int> vector1;
ifstream myfile("numbers.txt");
if ( myfile.is_open()) {
while( myfile.good() )
{
getline(myfile, line);
vector1.push_back(atoi(line.c_str()));
}
myfile.close();
}
else {
cout << "Unable to open file" << endl;
}
algorithm(vector1);
return 0;
}
You should not use while (myfile.good()), as it will loop once to many.
Instead use
while (getline(...))
The reason you can't use the flags to check for looping, is that they don't get set until after an input/output operation notices the problem (error or end-of-file).
Don't use good() as the condition of your extraction loop. It does not accurately indicate whether the next read will succeed or not. Move your call to getline into the condition:
while(getline(myfile, line))
{
vector1.push_back(atoi(line.c_str()));
}
The reason it is failing in this particular case is because text files typically have an \n at the end of the file (that is not shown by text editors). When the last line is read, this \n is extracted from the stream. Yes, that may be the very last character in the file, but getline doesn't care to look any further than the \n it has extracted. It's done. It does not set the EOF flag or do anything else to cause good() to return false.
So at the next iteration, good() is still true, the loop continues and getline attempts to extract from the file. However, now there's nothing left to extract and you just get line set to an empty string. This then gets converted to an int and pushed into the vector1, giving you the extra value.
In fact, the only robust way to check if there is a problem with extraction is to check the stream's status bits after extracting. The easiest way to do this is to make the extraction itself the condition.
You read one too many lines, since the condition while is false AFTER you had a "bad read".
Welcome to the wonderful world of C++. Before we go to the bug first, I would advise you to drop the std:: namespace resolution before defining or declaring a vector as you already have
using namespace::std;
A second advise would be to use the pre increment operator ++i instead of i++ wherever feasible. You can see more details on that here.
Coming to your problem in itself, the issue is an empty new line being read at the end of file. A simple way to avoid this would be to check the length of line before using it.
getline(myfile, line);
if (line.size()) {
vector1.push_back(atoi(line.c_str()));
}
This would enable your program now to read a file interspersed with empty lines. To be further foolproof you can check the line read for presence of any non numeric characters before using atoi on it. However the best solution as mentioned would be use to read the line read to the loop evaluation.

.get function with an opened file stream in C++

I have an open input file stream. It is able to open the other file (a txt file) successfully. And by making adjustments to the code right below I can get it read and output the other txt file (all ASCII characters, just letters) just fine. However, I was playing around with the below function. This results in one line being read, when there are in fact three lines. I want to know why. The size of the array is not the problem, i.e., making it larger does not seem to fix anything.
void DispFile(fstream& iFile)
{
auto char fileChar[256];
while (inFile.get(fileChar,256))
{
cout << fileChar;
}
}
Here is the code that WORKS:
void DispFile(fstream& iFile)
{
auto char fileChar[256];
while (inFile.getline(fileChar,256))
{
cout << fileChar;
cout << endl;
}
}
OR
void DispFile(fstream& iFile)
{
char file;
while (inFile.get(file)
{
cout << file;
}
}
So why does using inFile.get(array, dimension) result in only one line being read, while the others work like a charm (so to speak).
In the first version the .get(array,size) extracts characters until the delimiting characters. By default this is a newline, '\n'. However once reaches this character, it does not extract it from the input stream but leaves it for the next input attempt. Therefore the next time you call get() it will find the newline from the previous get() and immediately stop.
The .getline() works because it extracts the newline and the .get works because it simply gets each character one at a time until the end of file.
get(char*, int) reads until the deliminator ('\n') but does not extract it - so it gets "stuck" on it. getline removes it.

functionality of cin in c++

I'm a bit confused by the results of the following function:
int main() {
string command;
while(1) {
cin >> command;
if(command == "end")
return 0;
else
cout << "Could you repeat the command?" << endl;
}
return 0;
}
First of all - the output line ("could you...") repeats once for each individual word in the input (stored in command). So far as I can see, it should only be possible for it to happen once for each instance of the loop.
Also, when the line 'if(command == "end")' is changed to 'if(command == "that's all")' it never triggers. A little testing suggested that all of the whitespace was removed from the command.
Could someone explain to me what's going on here?
Thanks
The formatted input operator >>() reads space separated tokens from input. If you want to read whole lines, use the getline() function:
string command;
getline( cin, command );
Most (possibly all) operating systems buffer input. When you type a string of words and then hit [enter] it is only at the time you hit enter that the input is usually passed to your program. Thus that is when it will start reading the input and separating it out into individual words (because as Neil mentions, the >> reads words, not lines). Thus your program goes through the loop multiple times (once per word you had in the line) even though you only hit enter once.
So, you are correct when you think it should only display "could you..." once per loop. That is what is happening.
Likewise, you'll never have a command that contains more than one word because of the space delimiter. As mentioned, use getline() to retrieve the entire text for the line you entered.