Extraction operator into non-numeric character - c++

Given an istringstream, is it possible to "extract" its contents into a character only if the character to be extracted is non-numeric (i.e. not 0-9)?
For example, this
string foo = "+ 2 3";
istringstream iss(foo);
char c;
iss >> skipws >> c; //Do this only if c would be non-numeric
should extract '+', but if foo were "2 + 3", it shouldn't extract anything, since the first [non-whitespace] character is '2', which is numeric.
To give some context, I need this to make a recursive "normal polish notation" (i.e. prefix notation) parser.

You can use unget to put back the character if it is numeric.
string foo = "+ 2 3";
istringstream iss(foo);
char c;
iss >> skipws >> c;
if (std::isdigit(c)) iss.unget();

You can use istream::peek to check what the next character will be before extracting it. You can test the result of peek against your acceptable range, and if it matches, do the actual extraction.
BTW, if you want to skip whitespace, you'll also need to check for and handle that with peek() (by extracting and discarding whitespace characters). Even with skipws, peek() won't peek past the whitespace.

Related

Extraction Operator reaching EOF on istringstream, behavioral change with int/char/string

So I am doing some simple file I/O in c++ and I notice this behaviour, not sure if I am forgetting something about the extraction operator and chars
Note that the file format in Unix.
ifstream infile("test.txt");
string line;
while(getline(infile, line)){
istringstream iss(line);
**<type>** a;
for(...){
iss >> a;
}
if(iss.eof())
cout << "FAIL" << endl;
}
Say that the input file test.txt looks like this and the <type> of a is int
$ is the newline character (:set line)
100 100 100$
100 100 100$
what I notice is that after the first line is read, EOF is set true;
If the input file is like so, and the <type> of a is char:
a b c$
a b c$
Then the Code behaves perfectly as expected.
From what I understand about File I/O and the extraction operator, the leading spaces are ignored, and the carriage lands on the character after the input is taken out of the input stringstream iss. So in both cases, at the end of each stringstream the carriage lands on the newline character, and it shouldn't be an EOF.
Changing the <type> of a to string had similar failure as <type> = int
BTW failbit is not set,
at the end:
good = 0
fail = 0
eof = 1
getline has extracted and discarded the newline, so line contains 100 100 100, not 100 100 100$, where $ is representing the newline. This means reading all three tokens from the line with a stringstream and the >> operator may reach the EOF and produce the FAIL message.
iss >> a; when a is an int or a string will skip all preceding whitespace and then continue extracting until it reaches a character that can't possibly be part of an int or is whitespace or is the end of the stream. On the third >> from the stream, the end of the stream stops the extraction and the stream's EOF flag is set.
iss >> a; when a is an char will skip all preceding whitespace and then extract exactly one character. In this case the third >> will extract the final character and stop before seeing the end of the stream and without setting the EOF flag.

How to delimit this text file? strtok

so there's a text file where I have 1. languages, a 2. text of a number written in the said language, 3. the base of the number and 4. the number written in digits. Here's a sample:
francais deux mille quatre cents 10 2400
How I went about it:
struct Nomen{
char langue[21], nomNombre [31], baseC[3], nombreC[21];
int base, nombre;
};
and in the main:
if(myfile.is_open()){
{
while(getline(myfile, line))
{
strcpy(Linguo[i].langue, strtok((char *)line.c_str(), " "));
strcpy(Linguo[i].nomNombre, strtok(NULL, " "));
strcpy(Linguo[i].baseC, strtok(NULL, " "));
strcpy(Linguo[i].nombreC, strtok(NULL, "\n"));
i++;
}
Difficulty: I'm trying to put two whitespaces as a delimiter, but it seems that strtok() counts it as if there were only one whitespace. The fact there are spaces in the text number, etc. is messing up the tokenization. How should I go about it?
strtok treats any single character in the provided string as a delimiter. It does not treat the string itself as a single delimiter. So " " (two spaces) is the same as " " (one space).
strtok will also treat multiple delimiters together as a single delimiter. So the input "t1 t2" will be tokenized as two tokens, "t1" and "t2".
As mentioned in comments, strtok is also writes the NUL character into the input to create the token strings. So, it is an error to pass the result of string::c_str() as input to the function. The fact that you need to cast the constant string should have been enough to dissuade you from this approach.
If you want to treat a double space as a delimiter, you will have to scan the string and search for them yourself. Given you are using C APIs, you can consider strstr. However, in C++, you can use string::find.
Here's an algorithm to parse your string manually:
Given an input string input:
language is the substring from the start of input to the first SPC character.
From where language ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
text is the substring from the start of input to the first double SPC sequence.
From where text ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
Parse base, and parse number.

C++ if statement not parsing whitespace

My if statement in a c++ console application is not working:
string line;
cin >> line;
if (line == "a b"){
cout << "lalala";
}
When I type "a b" nothing happens.
if i use if (line == "ab") and type ab it works.
std::cin reads delimited by white-space. So when you read cin >> line you are only reading "a" from "a b". Use std::getline(std::cin, line) to read the entire line, including white-space (not '\n').
BTW, you could have easily found this problem by looking at your variables with a debugger, or by printing it. Debug your code before posting questions.
NOTE: By std::cin I mean the operator>>, which has many overloads, as non-members and istream members. The overload that takes a std::string reads delimited by white-space.
Because the reading will end till the whitespace, that means line will be "a" here.
See operator>>(std::basic_string)
until one of the following conditions becomes true:
std::isspace(c,is.getloc()) is true for the next character c in is (this whitespace character remains in the input stream).

Alternative to std::istream::ignore

std::istream::ignore discards characters until one compares equal to delim. Is there an alternative working on strings rather then chars, i.e one that discards strings until one compares equal to the specified?
The easiest way would be to continuously extract a string until you find the right one:
std::istringstream iss;
std::string str;
std::string pattern = "find me";
while ( iss >> str && str != pattern ) ;
if (!iss) { /* Error occured */ }
This assumes that the strings are delimited with whitespace characters, of course.

reading a file word by word

I can read from a file 1 character at a time, but how do i make it go just one word at a time? So, read until there is a space and take that as a string.
This gets me the characters:
while (!fin.eof()){
while (fin>> f ){
F.push_back ( f );
}
If your f variable is of type std::string and F is std::vector<std::string>, then your code should do exactly what you want, leaving you with a list of "words" in the F vector. I put words in quotes because punctuation at the end of a word will be included in the input.
In other words, the >> operator automatically stops at whitespace (or eof) when the target variable type is a string.
Try this:
std::string word;
while (fin >> word)
{
F.push_back(word);
}