Puzzling behavior of istream::getline() - c++

I tested following codes to clarify my understanding to istream::getline():
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
string s("abcd efgh\nijklmnopqrst");
string s1;
stringstream ss(s);
ss >> s1;
cout << s1 << endl;
ss.getline(&s1[0], 250, '\n');
cout << s1 << endl;
ss >> s1;
cout << s1 << endl;
getchar();
return 1;
}
then the console printed:
abcd
efg
ijklmnopqrst
but in my opinon it should be
abcd
efgh
ijklmnopqrst
Besides, I found the size of s1 after calling ss.getline() was the same as that after calling ss>>, but the size will be changed after calling ss>> once more. Can anyone help me parsing?

ss.getline(&s1[0], 250, '\n');
The first parameter to this getline() call is a char *. ss knows absolutely nothing about the fact that this char buffer actually comes from a std::string, and its actually its internal buffer.
Complicating this entire affair is the fact that this std::string is under the impression that it contains four characters. Because that's all it has, at this point.
And there is absolutely nothing, whatsoever, that could possibly lead this std::string to change its mind. Just because a pointer to its internal character buffer was passed to getline(), which proceeded to rather rudely scribble all over it (resulting in undefined behavior, as I'll extrapolate in a moment), the std::string still believes that it contains four characters only.
Meanwhile, the initial formatted input operator, >> extracted the initial character, but did not extract the following space, so when this stream, subsequently, had this getline() call, it started off its job of extracting characters starting with this space character, and up until the next newline character -- five characters (if I count on my fingers), but dumping it into a buffer that's guaranteed, by the std::string, to only be long enough to hold four characters (because, keep in mind, the initial formatted extract operator, >>, only dumped four characters inside it).
I'm ignoring some details, such as the fact that std::string takes care of automatically tacking on a trailing '\0', but the bottom line is that this is undefined behavior. The getline call extracts more characters that the buffer it's given is guaranteed to hold. Undefined behavior. A whole big heap of undefined behavior. It's not just the four characters in your second line of output is not the four characters you were expected to see, it's just that the getline() actually ended up extracting more characters, but the std::string that's being printed here has every right under the constitution to believe that it still has only four characters, and it's just that it's internal buffer got stomped all over.

Two things.
First, >> does not consume whitespace, so getline will retrieve it.
Secondly, this line is not correct:
ss.getline(&s1[0], 250, '\n');
Since getline expects a std::basic_string, just pass in the string:
ss.getline(s1, 250, '\n');
In your code, &s1[0] gets access to the underlying buffer, which is written to, but the string's length is stored separately, and is still what is was from the previous read (which is why the h gets dropped). Though, at this point you've already invoked undefined behaviour due to a buffer overflow.

Related

Variable unexpectedly changed to 0 after cin to different variable

I have a problem with my program in c++: this program unexpectedly sets my variable n_alunni to 0 when I cin >> verify; even though I haven't written anything else to n_alunni.
This is the code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
int n_alunni=1;
char verify[1];
cout<<"Inserisci il numero di alunni: ";
cin>>n_alunni; // I set it to something > 0
cout<<n_alunni; // n_alunni > 0
cout<<"Sicuro di voler inserire "<<n_alunni<<" alunni (y,n)? ";
cin>>verify; // <-- After I insert into verify, n_alunni suddenly equals 0
cout<<n_alunni; // n_alunni == 0
}
Can anyone help me please, thanks everyone in advance
As Peter said in his answer, and TomLyngmo in his comment, trying to cin to a char[] (verify) makes C++ think you want to write a C-string, which always has a 'null terminator' to indicate the end of the string. However, your char[] is only length 1, so unless you don't insert anything into it, it will try to write beyond the end of the array, causing this undefined behavior (changing other variables randomly).
Since your verify doesn't need to be a C-string, you could change it to just char verify and remove any of the array-access business from it. cin should then see it as a proper char and not try to write any null terminator after it, avoiding the memory overwrite you're seeing that results in unexpectedly changing the value of the other variable to 0.
The input into verify is writing beyond the array bounds, overwriting other memory, among it your variable. Use std::string instead or at least increase the array size beyond the expected input length (and, for a safe program, protect against boundary violations!).
In more detail, the array argument is, as is the case in many contexts, "adjusted" to a pointer to char; this matches the istream& operator>> (istream& is, char* s). This operator copies a "word" from the standard input into the memory pointed to. It skips any whitespace (for example, the newline left behind from when you last hit the enter key) and then copies characters from stdin to the indicated memory location; it stops before the next whitespace in the input (for example, the newline produced when you hit the enter key to "finish your input"). After the input characters are written the routine terminates the entered 1-character "word" with a null character so that it is a proper "C string" after the crude fashion that was modern in 1978.
If you entered a one-character word, that null character gets written to memory adjacent to your 1-char array, in this case n_alunni. (You can verify that hypothesis by entering a number into n_alunni that is larger than 255, thus altering more bytes which will not be affected by the zero byte. On an intel architecture the new value of n_alunni after a one-character input should then be n_alunni & ~0xff; that is, the same as after the input with the lowest byte zeroed out).
As is often the case, using std::string for text is a safer way to handle unknown text. There is an istream& operator>> (istream& is, string& str) that works just like the char * overload, only safer.

How do the different ways to read strings from console actually differ? Operator <<, getline or cin.getline?

Let's suppose I'd like to read an integer from the console, and I would not like the program to break if it is fed non-integer characters. This is how I would do this:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string input; int n;
cin >> input;
if(!(stringstream(input)>>n)) cout << "Bad input!\n";
else cout << n;
return 0;
}
However, I see that http://www.cplusplus.com/doc/tutorial/basic_io/ uses getline(cin,input) rather than cin >> input. Are there any relevant differences between the two methods?
Also I wonder, since string is supposed not to have any length limits... What would happen if someone passed a 10GB long string to this program? Wouldn't it be safer to store the input in a limited-length char table and use, for example, cin.getline(input,256)?
std::getline gets a line (including spaces) and also reads (but discards) the ending newline. The input operator >> reads a whitespace-delimited "word".
For example, if your input is
123 456 789
Using std::getline will give you the string "123 456 789", but using the input operator >> you will get only "123".
And theoretically there's no limit to std::string, but in reality it's of course limited to the amount of memory it can allocate.
the first gets a line,
the second gets a world.if your input "hello world"
getline(cin,str) : str=="hello world"
cin>>str: str="hello"
and dont worry about out of range, like vector ,string can grow
operator>> reads a word (i.e. up to next whitespace-character)
std::getline() reads a "line" (by default up to next newline, but you can configure the delimiter) and stores the result in a std::string
istream::getline() reads a "line" in a fashion similar to std::getline(), but takes a char-array as its target. (This is the one you'll find as cin.getline())
If you get a 10 GB line passed to your program, then you'll either run out of memory on a 32-bit system, or take a while to parse it, potentially swapping a fair bit of memory to disk on a 64-bit system.
Whether arbitrary line-length size limitations make sense, really boils down to what kind of data your program expects to handle, as well as what kind of error you want to produce when you get too much data. (Presumably it is not acceptable behaviour to simply ignore the rest of the data, so you'll want to either read and perform work on it in parts, or error out)

Mistake using scanf

could you say me what is the mistake in my following code?
char* line="";
printf("Write the line.\n");
scanf("%s",line);
printf(line,"\n");
I'm trying to get a line as an input from the console.But everytime while using "scanf" the program crashes. I don't want to use any std, I totally want to avoid using cin or cout. I'm just trying to learn how to tak a full line as an input using scanf().
Thank you.
You need to allocate the space for the input string as sscanf() cannot do that itself:
char line[1024];
printf("Write the line.\n");
scanf("%s",line);
printf(line,"\n");
However this is dangerous as it's possible to overflow the buffer and is therefore a security concern. Use std::string instead:
std::string line;
std::cout << "Write the line." << std::endl;
std::cin >> line;
std::cout << line << std::endl;
or:
std::getline (std::cin, line);
Space not allocated for line You need to do something like
char *line = malloc();
or
Char line[SOME_VALUE];
Currently line is a poor pointer pointing at a string literal. And overwriting a string literal can result in undefined behaviour.
scanf() doesn't match lines.
%s matches a single word.
#include <stdio.h>
int main() {
char word[101];
scanf("%100s", word);
printf("word <%s>\n", word);
return 0;
}
input:
this is a test
output:
word <this>
to match the line use %100[^\n"] which means 100 char's that aren't newline.
#include <stdio.h>
int main() {
char word[101];
scanf("%100[^\n]", word);
printf("word <%s>\n", word);
return 0;
}
You are trying to change a string literal, which in C results in Undefined behavior, and in C++ is trying to write into a const memory.
To overcome it, you might want to allocate a char[] and assign it to line - or if it is C++ - use std::string and avoid a lot of pain.
You should allocate enough memory for line:
char line[100];
for example.
The %s conversion specifier in a scanf call expects its corresponding argument to point to a writable buffer of type char [N] where N is large enough to hold the input.
You've initialized line to point to the string literal "". There are two problems with this. First is that attempting to modify the contents of a string literal results in undefined behavior. The language definition doesn't specify how string literals are stored; it only specifies their lifetime and visibility, and some platforms stick them in a read-only memory segment while others put them in a writable data segment. Therefore, attempting to modify the contents of a string literal on one platform may crash outright due to an access violation, while the same thing on another platform may work fine. The language definition doesn't mandate what should happen when you try to modify a string literal; in fact, it explicitly leaves that behavior undefined, so that the compiler is free to handle the situation any way it wants to. In general, it's best to always assume that string literals are unwritable.
The other problem is that the array containing the string literal is only sized to hold 1 character, the 0 terminator. Remember that C-style strings are stored as simple arrays of char, and arrays don't automatically grow when you add more characters.
You will need to either declared line as an array of char or allocate the memory dynamically:
char line[MAX_INPUT_LEN];
or
char *line = malloc(INITIAL_INPUT_LEN);
The virtue of allocating the memory dynamically is that you can resize the buffer as necessary.
For safety's sake, you should specify the maximum number of characters to read; if your buffer is sized to hold 21 characters, then write your scanf call as
scanf("%20s", line);
If there are more characters in the input stream than what line can hold, scanf will write those extra characters to the memory following line, potentially clobbering something important. Buffer overflows are a common malware exploit and should be avoided.
Also, %s won't get you the full line; it'll read up to the next whitespace character, even with the field width specifier. You'll either need to use a different conversion specifier like %[^\n] or use fgets() instead.
The pointer line which is supposed to point to the start of the character array that will hold the string read is actually pointing to a string literal (empty string) whose contents are not modifiable. This leads to an undefined behaviour manifested as a crash in your case.
To fix this change the definition to:
char line[MAX]; // set suitable value for MAX
and read atmost MAX-1 number of characters into line.
Change:
char* line="";
to
char line[max_length_of_line_you_expect];
scanf is trying to write more characters than the reserved by line. Try reserving more characters than the line you expect, as been pointed out by the answers above.

How to read a specific amount of characters from a text file

I tried to do it like this
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
char b[2];
ifstream f("prad.txt");
f>>b ;
cout <<b;
return 0;
}
It should read 2 characters but it reads whole line. This worked on another language but doesn't work in C++ for some reason.
You can use read() to specify the number of characters to read:
char b[3] = "";
ifstream f("prad.txt");
f.read(b, sizeof(b) - 1); // Read one less that sizeof(b) to ensure null
cout << b; // terminated for use with cout.
This worked on another language but doesn't work in C++ for some
reason.
Some things change from language to language. In particular, in this case you've run afoul of the fact that in C++ pointers and arrays are scarcely different. That array gets passed to operator>> as a pointer to char, which is interpreted as a string pointer, so it does what it does to char buffers (to wit read until the width limit or end of line, whichever comes first). Your program ought to be crashing when that happens, since you're overflowing your buffer.
istream& get (char* s, streamsize n );
Extracts characters from the stream and stores them as a c-string into
the array beginning at s. Characters are extracted until either (n -
1) characters have been extracted or the delimiting character '\n' is
found. The extraction also stops if the end of file is reached in the
input sequence or if an error occurs during the input operation. If
the delimiting character is found, it is not extracted from the input
sequence and remains as the next character to be extracted. Use
getline if you want this character to be extracted (and discarded).
The ending null character that signals the end of a c-string is
automatically appended at the end of the content stored in s.

Reading a fixed number of chars with << on an istream

I was trying out a few file reading strategies in C++ and I came across this.
ifstream ifsw1("c:\\trys\\str3.txt");
char ifsw1w[3];
do {
ifsw1 >> ifsw1w;
if (ifsw1.eof())
break;
cout << ifsw1w << flush << endl;
} while (1);
ifsw1.close();
The content of the file were
firstfirst firstsecond
secondfirst secondsecond
When I see the output it is printed as
firstfirst
firstsecond
secondfirst
I expected the output to be something like:
fir
stf
irs
tfi
.....
Moreover I see that "secondsecond" has not been printed. I guess that the last read has met the eof and the cout might not have been executed. But the first behavior is not understandable.
The extraction operator has no concept of the size of the ifsw1w variable, and (by default) is going to extract characters until it hits whitespace, null, or eof. These are likely being stored in the memory locations after your ifsw1w variable, which would cause bad bugs if you had additional variables defined.
To get the desired behavior, you should be able to use
ifsw1.width(3);
to limit the number of characters to extract.
It's virtually impossible to use std::istream& operator>>(std::istream&, char *) safely -- it's like gets in this regard -- there's no way for you to specify the buffer size. The stream just writes to your buffer, going off the end. (Your example above invokes undefined behavior). Either use the overloads accepting a std::string, or use std::getline(std::istream&, std::string).
Checking eof() is incorrect. You want fail() instead. You really don't care if the stream is at the end of the file, you care only if you have failed to extract information.
For something like this you're probably better off just reading the whole file into a string and using string operations from that point. You can do that using a stringstream:
#include <string> //For string
#include <sstream> //For stringstream
#include <iostream> //As before
std::ifstream myFile(...);
std::stringstream ss;
ss << myFile.rdbuf(); //Read the file into the stringstream.
std::string fileContents = ss.str(); //Now you have a string, no loops!
You're trashing the memory... its reading past the 3 chars you defined (its reading until a space or a new line is met...).
Read char by char to achieve the output you had mentioned.
Edit : Irritate is right, this works too (with some fixes and not getting the exact result, but that's the spirit):
char ifsw1w[4];
do{
ifsw1.width(4);
ifsw1 >> ifsw1w;
if(ifsw1.eof()) break;
cout << ifsw1w << flush << endl;
}while(1);
ifsw1.close();
The code has undefined behavior. When you do something like this:
char ifsw1w[3];
ifsw1 >> ifsw1w;
The operator>> receives a pointer to the buffer, but has no idea of the buffer's actual size. As such, it has no way to know that it should stop reading after two characters (and note that it should be 2, not 3 -- it needs space for a '\0' to terminate the string).
Bottom line: in your exploration of ways to read data, this code is probably best ignored. About all you can learn from code like this is a few things you should avoid. It's generally easier, however, to just follow a few rules of thumb than try to study all the problems that can arise.
Use std::string to read strings.
Only use fixed-size buffers for fixed-size data.
When you do use fixed buffers, pass their size to limit how much is read.
When you want to read all the data in a file, std::copy can avoid a lot of errors:
std::vector<std::string> strings;
std::copy(std::istream_iterator<std::string>(myFile),
std::istream_iterator<std::string>(),
std::back_inserter(strings));
To read the whitespace, you could used "noskipws", it will not skip whitespace.
ifsw1 >> noskipws >> ifsw1w;
But if you want to get only 3 characters, I suggest you to use the get method:
ifsw1.get(ifsw1w,3);