I'm writing a parser, and I was previously having trouble when I try to parse identifiers (anything that's valid for a C++ variable name) and unclosed string literals (anything starting with ", but missing the closing ") at the end of my input. I think it's because the lexer (TokenStream) uses std::noskipws in these cases and builds the token character by character. Here is where I believe I have narrowed down the problem (shown only for one of the two cases, as the other is very similar logic):
std::string TokenStream::get()
{
char c;
(*input) >> c; // input is of type istream*
// other cases...
if (c == '"')
{
std::string s = stringFromChar(c); // just makes a string from the char.
char d;
while (true) // 1)
{
(*input) >> std::noskipws >> d;
std::cout << d; // 2)
if (d == '"')
{
s += d;
(*input) >> std::skipws;
break;
}
s += d;
}
return s;
}
// other cases...
}
Note that this function is supposed to just generate tokens from the input in a stream-like fashion. Now, if I input either a literal (like asdf) or an unclosed string (like "asdf), then the program will hang, and the line marked 2) will just output the last character of the input (in my examples, f) over and over again forever.
I've solved this problem by using a check for input->eof(), but my question is this:
Why does the loop (marked 1) in comments) keep executing when I hit the end of stream, and why does it just print that last character read every time through the loop?
Lets look at the loop in question line-by-line
while (true) // 1)
That's gonna loop, unless a break is encountered
{
(*input) >> std::noskipws >> d;
Read a character. If can't read character, d is likely to be unchanged.
std::cout << d; // 2)
Print the character that is just read
if (d == '"')
Nope, the last character was not " (as specified in the question)
{
s += d;
(*input) >> std::skipws;
break;
}
s += d;
}
Therefore the break is never encountered and the last character is printed in an endless loop.
Fix: always use a while look like this for input:
char ch;
while (input >> ch) {
// ch contains a new letter, deal with it
}
Related
First of all, I should mention that I found several closely related questions. Eg here and here. However, neither do I want to use printf nor do I want to use \n (because I know already that it does not work).
Is it possible for the user to enter a newline, probably an escape sequence, without hitting enter?
As an example:
#include <iostream>
#include <string>
int main () {
std::string a,b;
std::cin >> a;
std::cin >> b;
std::cout << a << "\n" << b;
}
Is it possible for a user to provide a single line of input
hello ??? world
such that the above prints
hello
world
?
You can do like this
std::string a, b;
std::cin >> a>>b;
std::cout << a << "\n" << b;
User can give input with space.
(I assume that you do not want spaces to delimit strings. For example,
Foo bar ??? baz qux
should be two lines.)
It is not possible that you configure the streams so that ??? is automatically converted to a newline character. For the user to input a newline character, they have to input a newline character, not anything else.
You have to parse it yourself.
Here's an example parser that treats ??? as delimiter:
void read_string(std::istream& is, std::string& dest)
{
std::string str = "";
for (char c; is.get(c);) {
switch (c) {
case '?':
if (is.get(c) && c == '?') {
if (is.get(c) && c == '?') {
dest = str;
return;
} else {
str += "??";
}
} else {
str += "?";
}
default:
str += c;
}
}
}
For example, the input
? is still one question mark????? is still two question marks???
is parsed as two lines:
? is still one question mark
?? is still two question marks
live demo
I was solving a question on hackerrank and came across this problem involving string streams.
https://www.hackerrank.com/challenges/c-tutorial-stringstream/problem
For Extracting data, hackerrank has given an example:
stringstream ss("23,4,56");
char ch;
int a, b, c;
ss >> a >> ch >> b >> ch >> c; // a = 23, b = 4, c = 56
However, when I try to export it to a vector, I have to escape the ',' using:
stringstream ss(str);
vector<int> vect;
int i;
while (ss >> i)
{
vect.push_back(i);
if (ss.peek() == ',')
ss.ignore();
}
Why can't I use the extraction operation to get the required word here? Shouldn't the stream escape the ','(Sorry for the noob-level question)?
operator>> extracts the next delimited token, only so far as characters actually belong to the requested data type. So, when using operator>> to read an int, it will extract only digits, not letters, punctuation, etc. That means a comma following a number has to be read separately.
In the first example:
ss >> a reads the first int in the stream
then >> ch reads the comma after it
then >> b reads the next int
then >> ch reads the comma after it
then >> c reads the next int
In the second example:
ss >> i reads the next int in the stream, breaking the loop if fails or EOF
then ss.peek() checks if a comma exists (since the last int doesn't have one), and if found then ss.ignore() skips past it
goto #1
If you try to use operator>> to read a comma that doesn't exist, it will set the stream's eofbit state and fail the extraction. If you use while (ss >> i >> ch), the while would evaluate as false when the last int is reached. Even though ss >> i would succeed, >> ch would fail, and thus i would not be added to the vector.
In theory, you could replace if (ss.peek() == ',') ss.ignore(); inside the loop with char ch; ss >> ch instead. The end effect would be the same, at least for a string like "23,4,56". But, let's say you were given something like "23 4 56" instead. The first example would fail to handle that correctly, but the second example would handle it just fine when using peek()+ignore(), but not when using ss >> ch.
I think you can use this code to escape the ','
std::string valstr;
while (std::getline(ss, valstr, ','))
{
vect.push_back(std::stoi(valstr));
}
This question already has answers here:
How to test whether stringstream operator>> has parsed a bad type and skip it
(5 answers)
Closed 8 years ago.
i have a stupid question.
I have a .txt file. Once opened, i need to take only numbers and skipping words.
Is there any method to check if next element is a word or not?
Because my file is like: word 1 2 word 1 2 3 4 5 6...
int n,e;
string s;
ifstream myfile("input.txt");
and so i think that's a stupid method to avoid the problem using a string and put the content in a string and then taking numbers, right like this:
myfile >> s;
myfile >> n;
myfile >> e;
You can do the following
int num = 0;
while(myfile >> num || !myfile.eof()) {
if(myfile.fail()) { // Number input failed, skip the word
myfile.clear();
string dummy;
myfile >> dummy;
continue;
}
cout << num << endl; // Do whatever necessary with the next number read
}
See a complete, working sample here
When reading in from a file as you are doing, all data is seen as a string. You must check to see if the string is a number. Here is a way to convert a string to an integer (IF THAT STRING IS AN INTEGER): atoi() function
Be careful though, you must pass it a c-string.
You can get the all data as a string and try convert the data to an integer in a try {} catch () { } block. If the data is real an integer, perform the operation in try section, else if code go to the catch and don't do any operation in catch.
Oops it's already solved. But worth to mention, there is also possibility to:
either read individual chars from the stream and pushback() if they are digits before using operator >>
or peek() the next chars in the stream without reading it to decisde whether to ignore it or to use operator >>
Just be carefull about the '-' which is not a digit but could be the sign of an interger.
Here a small example :
int c, n, sign=1;
ifstream ifs("test.txt", std::ifstream::in);
while (ifs.good() && (c=ifs.peek())!=EOF ) {
if (isdigit(c)) {
ifs >> n;
n *= sign;
sign = 1;
cout << n << endl;
}
else {
c=ifs.get();
if (c == '-')
sign = -1;
else sign = 1;
}
}
ifs.close();
It's not the most performant approach, however it has the advantage of only reading from stream, without intermediary strings and memory management.
How do I get rid of the leading ' ' and '\n' symbols when I'm not sure I'll get a cin, before the getline?
Example:
int a;
char s[1001];
if(rand() == 1){
cin >> a;
}
cin.getline(s);
If I put a cin.ignore() before the getline, I may lose the first symbol of the string, so is my only option to put it after every use of 'cin >>' ? Because that's not very efficient way to do it when you are working on a big project.
Is there a better way than this:
int a;
string s;
if(rand() == 1){
cin >> a;
}
do getline(cin, s); while(s == "");
Like this:
std::string line, maybe_an_int;
if (rand() == 1)
{
if (!(std::getline(std::cin, maybe_an_int))
{
std::exit(EXIT_FAILURE);
}
}
if (!(std::getline(std::cin, line))
{
std::exit(EXIT_FAILURE);
}
int a = std::stoi(maybe_an_int); // this may throw an exception
You can parse the string maybe_an_int in several different ways. You could also use std::strtol, or a string stream (under the same condition as the first if block):
std::istringstream iss(maybe_an_int);
int a;
if (!(iss >> a >> std::ws) || iss.get() != EOF)
{
std::exit(EXIT_FAILURE);
}
You could of course handle parsing errors more gracefully, e.g. by running the entire thing in a loop until the user inputs valid data.
Both the space character and the newline character are classified as whitespace by standard IOStreams. If you are mixing formatted I/O with unformatted I/O and you need to clear the stream of residual whitespace, use the std::ws manipulator:
if (std::getline(std::cin >> std::ws, s) {
}
I am reading a std::istream and I need to verify without extracting characters that:
The stream is not "empty", i.e. that trying to read a char will not result in an fail state (solved by using peek() member function and checking fail state, then setting back to original state)
That among the characters left there is at least one which is not a space, a tab or a newline char.
The reason for this is, is that I am reading text files containing say one int per line, and sometimes there may be extra spaces / new-lines at the end of the file and this causes issues when I try get back the data from the file to a vector of int.
A peek(int n) would probably do what I need but I am stuck with its implementation.
I know I could just read istream like:
while (myInt << myIstream) {…} //Will fail when I am at the end
but the same check would fail for a number of different conditions (say I have something which is not an int on some line) and being able to differentiate between the two reading errors (unexpected thing, nothing left) would help me to write more robust code, as I could write:
while (something_left(myIstream)) {
myInt << myIstream;
if (myStream.fail()) {…} //Horrible things happened
}
Thank you!
There is a function called ws which eats whitespace. Perhaps you could call that after each read. If that hits eof, then you know you've got a normal termination. If it doesn't and the next read doesn't produce a valid int, then you know you've got garbage in your file. Maybe something like:
#include <fstream>
#include <iostream>
int main()
{
std::ifstream infile("test.dat");
while (infile)
{
int i;
infile >> i;
if (!infile.fail())
std::cout << i << '\n';
else
std::cout << "garbage\n";
ws(infile);
}
}
this is what I did to skip whitespace/detect EOF before the actual input:
char c;
if (!(cin >> c)) //skip whitespace
return false; // EOF or other error
cin.unget();
This is independent of what data you are going to read.
This code relies on the skipws manipulator being set by default for standard streams, but it can be set manually cin >> skipw >> c;
And simple
for(;;){
if(!(myIstream >> myInt)){
if(myIstream.eof()) {
//end of file
}else{
//not an integer
}
}
// Do something with myInt
}
does not work? Why you need to know if there are numbers left?
Edit Changed to Ben's proposition.
The usual way to handle this situation is not to avoid reading from the stream, but to put back characters, which have been read, if needed:
int get_int(std::istream& in)
{
int n = 0;
while(true) {
if (in >> n)
return n;
clean_input(in);
}
}
void clean_input(std::istream& in)
{
if (in.fail()) {
in.clear();
// throw away (skip) pending characters in input
// which are non-digits
char ch;
while (in >> ch) {
if (isdigit(ch)) {
// stuff digit back into the stream
in.unget();
return;
}
}
}
error("No input"); // eof or bad
}