What's the difference between read, readsome, get, and getline? - c++

What is the difference between these functions. When I use them they all do the same thing. For example all three calls return "hello":
#include <iostream>
#include <sstream>
int main()
{
stringstream ss("hello");
char x[10] = {0};
ss.read(x, sizeof(x)); // #1
std::cout << x << std::endl;
ss.clear();
ss.seekg(0, ss.beg);
ss.readsome(x, sizeof(x)); // #2
std::cout << x << std::endl;
ss.clear();
ss.seekg(0, ss.beg);
ss.get(x, sizeof(x)); // #3
std::cout << x;
ss.clear();
ss.seekg(0, ss.beg);
ss.getline(x, sizeof(x)); // #4
std::cout << x << std:endl;
}

get and getline are quite similar, when get is called with parameters ( char_type* s, std::streamsize count ). However, get reads from the stream until a delimiter is found, and then leaves it there. getline by comparison will pull the delimiter off the stream, but then drop it. It won't be added to the buffer it fills.
get looks for \n, and when a specific number of characters is provided in an argument (say, count) it will read up to count - 1 characters before stopping. read will pull in all count of them.
You could envisage read as being an appropriate action on a binary datasource, reading a specific number of bytes. get would be more appropriate on a text stream, when you're reading into a string that you'd like null-terminated, and where things like newlines have useful syntactic meanings splitting up text.
readsome only returns characters that are immediately available in the underlying buffer, something which is a bit nebulous and implementation specific. This probably includes characters returned to the stream using putback, for example. The fact that you can't see the difference between read and readsome just shows that the two might share an implementation on the particular stream type and library you are using.

I've observed the difference between read() and readsome() on a flash filing system.
The underlying stream reads 8k blocks and the read method will go for the next block to satisfy the caller, whereas the readsome method is allowed to return less than the request in order to avoid spending time fetching the next block.

The main difference between get() and getline() is that get() leaves the newline character in the input stream, making it the first character seen by the next input operation, whereas getline() extracts and discards the newline character from the input stream.

Related

Read a file line by line in C++

I wrote the following C++ program to read a text file line by line and print out the content of the file line by line. I entered the name of the text file as the only command line argument into the command line.
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
char buf[255] = {};
if (argc != 2)
{
cout << "Invalid number of files." << endl;
return 1;
}
ifstream f(argv[1], ios::in | ios::binary);
if (!f)
{
cout << "Error: Cannot open file." << endl;
return 1;
}
while (!f.eof())
{
f.get(buf,255);
cout << buf << endl;
}
f.close();
return 0;
}
However, when I ran this code in Visual Studio, the Debug Console was completely blank. What's wrong with my code?
Apart from the errors mentioned in the comments, the program has a logical error because istream& istream::get(char* s, streamsize n) does not do what you (or I, until I debugged it) thought it does. Yes, it reads to the next newline; but it leaves the newline in the input!
The next time you call get(), it will see the newline immediately and return with an empty line in the buffer, for ever and ever.
The best way to fix this is to use the appropriate function, namely istream::getline() which extracts, but does not store the newline.
The EOF issue
is worth mentioning. The canonical way to read lines (if you want to write to a character buffer) is
while (f.getline(buf, bufSz))
{
cout << buf << "\n";
}
getline() returns a reference to the stream which in turn has a conversion function to bool, which makes it usable in a boolean expression like this. The conversion is true if input could be obtained. Interestingly, it may have encountered the end of file, and f.eof() would be true; but that alone does not make the stream convert to false. As long as it could extract at least one character it will convert to true, indicating that the last input operation made input available, and the loop will work as expected.
The next read after encountering EOF would then fail because no data could be extracted: After all, the read position is still at EOF. That is considered a read failure. The condition is wrong and the loop is exited, which was exactly the intent.
The buffer size issue
is worth mentioning, as well. The standard draft says in 30.7.4.3:
Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) for the next available input character c
(in which case the input character
is extracted but not stored);
n is less than one or n - 1 characters are stored
(in which case the function calls setstate(
failbit)).
The conditions are tested in that order, which means that if n-1 characters have been stored and the next character is a newline (the default delimiter), the input was successful (and the newline is extracted as well).
This means that if your file contains a single line 123 you can read that successfully with f.getline(buf, 4), but not a line 1234 (both may or may not be followed by a newline).
The line ending issue
Another complication here is that on Windows a file created with a typical editor will have a hidden carriage return before the newline, i.e. a line actually looks like "123\r\n" ("\r" and "\n" each being a single character with the values 13 and 10, respectively). Because you opened the file with the binary flag the program will see the carriage return; all lines will contain that "invisible" character, and the number of visible characters fitting in the buffer will be one shorter than one would assume.
The console issue ;-)
Oh, and your Console was not entirely empty; it's just that modern computers are too fast and the first line which was probably printed (it was in my case) scrolled away faster than anybody could switch windows. When I looked closely there was a cursor in the bottom left corner where the program was busy printing line after line of nothing ;-).
The conclusion
Debug your programs. It's very easy with VS.
Use getline(istream, string).
Use the return value of input functions (typically the stream)
as a boolean in a while loop: "As long as you can extract any input, use that input."
Beware of line ending issues.
Consider C I/O (printf, scanf) for anything non-trivial (I didn't discuss this in my answer but I think that's what many people do).

Read a file line-by-line twice using stringstream

I Need to read a file line-by-line twice. The file content is expected to fit into memory. So, I would normally read the whole file into a buffer and work with that buffer afterwards.
However, since I would like to use std::getline, I need to work with a std::basic_istream. So, I thought it would be a good idea to write
std::ifstream file(filepath);
std::stringstream ss;
ss << file.rdbuf();
for (std::string line; std::getline(ss, line);)
{
}
However, I'm not sure what exactly is happening here. I guess ss << file.rdbuf(); does not read the file into any internal buffer of ss. Actual file access should occure only at std::getline(ss, line);.
So, with a second for-loop of the provided form, I should end in reading the whole file once again. That's inefficient.
Am I correct and hence need to come up with an other approach?
I guess ss << file.rdbuf(); does not read the file into any internal
buffer of ss. Actual file access should occure only at
std::getline(ss, line);.
This is incorrect. cppreference.com has this to say about that operator<< overload:
basic_ostream& operator<<( std::basic_streambuf<CharT, Traits>* sb); (9)
9) Behaves as an UnformattedOutputFunction. After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met:
end-of-file occurs on the input sequence;
inserting in the output sequence fails (in which case the character to be inserted is not extracted);
an exception occurs (in which case the exception is caught).
If no characters were inserted, executes setstate(failbit). If an exception was thrown while extracting, sets failbit and, if failbit is set in exceptions(), rethrows the exception.
So your assumption is incorrect. The entire contents of file is copied to the buffer controlled by ss, so reading from ss does not access the filesystem. You can freely read through ss and seek back to the beginning as many times as you like without incurring the overhead of re-reading the file each time.
After the first loop, clear the EOF and fail bits and go back to the beginning of the stringstream with:
ss.clear();
ss.seekg(0, std::ios::beg);
Am I correct and hence need to come up with an other approach?
You're not correct. The "hense" is unwarranted also. There's not enough info in the question, but I suspect the problem has nothing to do with using a stream buffer.
Without knowing what that first "garbage" character is, I cannot say for sure, but I suspect the file is in a wide-character unicode format, and you are using access operations that do not work on wide characters. If that is the case, buffering the file has nothing to do with the problem.
As an experiment, try the following. Mind the w's.
std::wifstream file(filepath);
std::wstringstream ss;
ss << file.rdbuf();
for (int i = 0; i < 42; ++i) {
wchar_t ch;
ss >> ch;
std::cout << static_cast<unsigned>(ch) << ' ';
}
It would not surprise me if the first four numbers are 255 254 92 0, or 255 254 47 0.
This might help: Problem using getline with unicode files

istream::unget() in C++ doesn't work as I thought

unget isn't working the way I thought it would... Let me explain myself. As I think, unget takes the last character extracted in the stream and it puts it back in the stream (and ready to be extracted again). Internally, it's decreasing the pointer in the stream buffer (creating the sentry and all that stuff).
But, when I use two unget() one behind the other, it's behaviour get deeply strange. If write something like hello<bye, and I use < as a delimiter, if I use getline and later two ungets, it returns me hello, and no o<bye". This is my code:
#include <iostream>
#define MAX_CHARS 256
using namespace std;
int main(){
char cadena[MAX_CHARS];
cout << "Write something: ";
cin.getline(cadena, MAX_CHARS, '<');
cout << endl << "Your first word delimited by < is: " << cadena << endl;
cin.unget(); //Delimiter (removed by getline) is put back in the stream
cin.unget(); //!?
cin >> cadena;
cout << "Your phrase with 2 ungets done..." << cadena;
return 0;
}
Try with bye<hello, then cadena gets bye and not e<hello I thought that unget works with the last one character each time it's called, what the f*** is happening?
The problem you are observing isn't surprising at all. First off, note that ungetting characters may or may not be supported by the underlying stream buffer. Typically, at least one character of putback is supported. Whether this is actually true and if any more characters are supported is entirely up to the stream buffer.
What happens in your test program is simply that the second unget() fails, the stream goes into failure state (i.e., std::ios_base::failbit is set) and another attempt to read something just fails. The failed read leave the original buffer unchanged and since it isn't tested (as it should), it looks as if the same string was read twice.
The fundamental reason std::cin is likely to support only one character to be put back is that it is synchronized with stdin by default. As a result, std::cin doesn't do any buffer (causing it to be rather slow as well for that matter). There is a fair chance that you can get better results by no synchronizing with stdin:
std::ios_base::sync_with_stdio(false);
This will improve the performance and the likelihood of putting more characters being successful. There is still no guarantee that you can put multiple character (or even just one character) back. If you really need to put back character, you should consider using a filtering stream buffer which supports as many character puthback as you need. In general, tokenizing input doesn't require any characters of putback which is the basic reason that there is only mediocre support: since putback support is bad, you are best off using proper tokenizing which reduces the need to improve putback. Somewhat of a circular argument. Since you can always create your own stream buffer it isn't really harmful, though.
The actuall reason for this behaviour is related to the failbits of stream as explained in previous answer. I can provide a work around code that may help you in achieving the results you want.
#include <iostream>
#include <boost/iostreams/filtering_stream.hpp>
// compile using g++ -std=c++11 -lboost_iostreams
#define MAX_CHARS 256
using namespace std;
int main(){
boost::iostreams::filtering_istream cinn(std::cin,0,1);
char cadena[MAX_CHARS];
cout << "Write something: ";
cinn.getline(cadena, MAX_CHARS, '<');
cout << endl << "Your first word delimited by < is: " << cadena << endl;
cinn.unget(); //Delimiter (removed by getline) is put back in the stream
cinn.unget(); //!?
cinn >> cadena;
cout << "Your phrase with 2 ungets done..." << cadena;
return 0;
}

Limiting input size with std::setw in std::cin

Let's say I have sample code:
std::string s;
std::cin >> std::setw(4) >> s;
std::cout << s;
Now for input abcdef the result will be abc and for abc it will be abc too. The question is how can I check whether the string was split in the middle due to the limit or the result string is the actual one? I need to know whether the input fits or some data was skipped.
Although I know that the stream's width is considered when reading into a char* I wasn't aware that it is also considered when reading into a std::string. Assuming it is, reading would stop under three conditions:
The stream is completely read in which case eof() is set.
The next character is a space.
The number of characters which need to be read are read.
That is, you can check in.eof() and std::isspace(in.peek(). Well, is there is a funny std::ctype<char> facet used by the stream you'd really need to use
std::isspace(in.getloc(),
std::char_traits<char>::to_char_type(in.peek()));

Reading binary istream byte by byte

I was attempting to read a binary file byte by byte using an ifstream. I've used istream methods like get() before to read entire chunks of a binary file at once without a problem. But my current task lends itself to going byte by byte and relying on the buffering in the io-system to make it efficient. The problem is that I seemed to reach the end of the file several bytes sooner than I should. So I wrote the following test program:
#include <iostream>
#include <fstream>
int main() {
typedef unsigned char uint8;
std::ifstream source("test.dat", std::ios_base::binary);
while (source) {
std::ios::pos_type before = source.tellg();
uint8 x;
source >> x;
std::ios::pos_type after = source.tellg();
std::cout << before << ' ' << static_cast<int>(x) << ' '
<< after << std::endl;
}
return 0;
}
This dumps the contents of test.dat, one byte per line, showing the file position before and after.
Sure enough, if my file happens to have the two-byte sequence 0x0D-0x0A (which corresponds to carriage return and line feed), those bytes are skipped.
I've opened the stream in binary mode. Shouldn't that prevent it from interpreting line separators?
Do extraction operators always use text mode?
What's the right way to read byte by byte from a binary istream?
MSVC++ 2008 on Windows.
The >> extractors are for formatted input; they skip white space (by
default). For single character unformatted input, you can use
istream::get() (returns an int, either EOF if the read fails, or
a value in the range [0,UCHAR_MAX]) or istream::get(char&) (puts the
character read in the argument, returns something which converts to
bool, true if the read succeeds, and false if it fails.
there is a read() member function in which you can specify the number of bytes.
Why are you using formatted extraction, rather than .read()?
source.get()
will give you a single byte. It is unformatted input function.
operator>> is formatted input function that may imply skipping whitespace characters.
As others mentioned, you should use istream::read(). But, if you must use formatted extraction, consider std::noskipws.