Tokenize stringstream based on type - c++

I have an input stream containing integers and special meaning characters '#'. It looks as follows:
... 12 18 16 # 22 24 26 15 # 17 # 32 35 33 ...
The tokens are separated by space. There's no pattern for the position of '#'.
I was trying to tokenize the input stream like this:
int value;
std::ifstream input("data");
if (input.good()) {
string line;
while(getline(data, line) != EOF) {
if (!line.empty()) {
sstream ss(line);
while (ss >> value) {
//process value ...
}
}
}
}
The problem with this code is that the processing stops when the first '#' is encountered.
The only solution I can think of is to extract each individual token into a string (not '#') and use atoi() function to convert the string to an integer. However, it's very inefficient as the majority tokens are integer. Calling atoi() on the tokens introduces big overhead.
Is there a way I can parse the individual token by its type? ie, for integers, parse it as integers while for '#', skip it. Thanks!

One possibility would be to explicitly skip whitespace (ss >> std::ws), and then to use ss.peek() to find out if a # follows. If yes, use ss.get() to read it and continue, otherwise use ss >> value to read the value.
If the positions of # don't matter, you could also remove all '#' from the line before initializing the stringstream with it.

Usually not worth testing against good()
if (input.good()) {
Unless your next operation is generating an error message or exception. If it is not good all further operations will fail anyway.
Don't test against EOF.
while(getline(data, line) != EOF) {
The result of std::getline() is not an integer. It is a reference to the input stream. The input stream is convertible to a bool like object that can be used in bool a context (like while if etc..). So what you want to do:
while(getline(data, line)) {
I am not sure I would read a line. You could just read a word (since the input is space separated). Using the >> operator on string
std::string word;
while(data >> word) { // reads one space separated word
Now you can test the word to see if it is your special character:
if (word[0] == "#")
If not convert the word into a number.
This is what I would do:
// define a class that will read either value from a stream
class MyValue
{
public:
bool isSpec() const {return isSpecial;}
int value() const {return intValue;}
friend std::istream& operator>>(std::istream& stream, MyValue& data)
{
std::string item;
stream >> item;
if (item[0] == '#') {
data.isSpecial = true;
} else
{ data.isSpecial = false;
data.intValue = atoi(&item[0]);
}
return stream;
}
private:
bool isSpecial;
int intValue;
};
// Now your loop becomes:
MyValue val;
while(file >> val)
{
if (val.isSpec()) { /* Special processing */ }
else { /* We have an integer */ }
}

Maybe you can read all values as std::string and then check if it's "#" or not (and if not - convert to int)

int value;
std::ifstream input("data");
if (input.good()) {
string line;
std::sstream ss(std::stringstream::in | std::stringstream::out);
std::sstream ss2(std::stringstream::in | std::stringstream::out);
while(getline(data, line, '#') {
ss << line;
while(getline(ss, line, ' ') {
ss2 << line;
ss2 >> value
//process values ...
ss2.str("");
}
ss.str("");
}
}
In here we first split the line by the token '#' in the first while loop then in the second while loop we split the line by ' '.

Personally, if your separator is always going to be space regardless of what follows, I'd recommend you just take the input as string and parse from there. That way, you can take the string, see if it's a number or a # and whatnot.

I think you should re-examine your premise that "Calling atoi() on the tokens introduces big overhead-"
There is no magic to std::cin >> val. Under the hood, it ends up calling (something very similar to) atoi.
If your tokens are huge, there might be some overhead to creating a std::string but as you say, the vast majority are numbers (and the rest are #'s) so they should mostly be short.

Related

C++ read specific range of line from file

I have the following content in a file:
A(3#John Brook)
A(2#Allies Frank)
A(1#Lucas Feider)
I want to read the line piecemeal. First I want to read in order. For example, A than 3 than John Brook. Every thing is fine till 3 but how can I read John Brook without "#" and ")" as string.
I have a funciton and you can have a look my codes:
void readFile()
{
ifstream read;
char process;
char index;
string data;
read.open("datas.txt");
while(true)
{
read.get(process);
read.get(index);
// Here, I need to read "John Brook" for first line.
// "Allies Frank" for second line.
// "Lucas Feider" for third line.
}
read.close();
}
First organize your data into some structure.
struct Data {
char process;
char index;
std::string data;
};
Then implement function which is able to read single item. Read separators into temporary variables and then later check if they contain proper values.
Here is an example assuming each item is in single line.
std::istream& operator>>(std::istream& in, Data& d) {
std::string l;
if (std::getline(in, l)) {
std::istringstream in_line{l};
char openParan;
char separator;
if (!std::getline(
in_line >> d.process >> openParan >> d.index >> separator,
d.data, ')') ||
openParan != '(' || separator != '#') {
in.setstate(std::ios::failbit);
}
}
return in;
}
After that rest is quick and simple.
https://godbolt.org/z/aGYvPeWfW

Stringstream Delimiter

Is there a default delimiter for stringstream? From my research, I understood that I can use it to split a string using space and comma as delimiters. But can I use other delimiters for stringstream?
Here is a C++ code snippet :
vector<int> parseInts(string str) {
// Complete this function
stringstream ss(str);
vector<int> res;
char ch;
int x;
while(ss){
ss >> x >> ch;
res.push_back(x);
}
return res;
}
This code works without me mentioning any specific delimiter. How does that happen?
There is no "delimiter" for streams at all. operator>>, on the other hand, implements its reading by delimiting on whitespace characters. For other delimiter characters, you can use std::getline() instead, eg:
vector<int> parseInts(string str) {
// Complete this function
istringstream iss(str);
vector<int> res;
int x;
string temp;
char delim = '-'; // whatever you want
while (getline(iss, temp, delim)) {
if (istringstream(temp) >> x) { // or std::stoi(), std::strtol(), etc
res.push_back(x);
}
}
return res;
}
This code works without me mentioning any specific delimiter. How does that happen?
streams don't know anything about delimiters. What is happening is that, on each loop iteration, you are calling ss >> x to read the next available non-whitespace substring and convert it to an integer, and then you are calling ss >> ch to read the next available non-whitespace character following that integer. The code doesn't care what that character actually is, as long as it is not whitespace. Your loop runs until it reaches the end of the stream, or encounters a reading/conversion error.

Reading text with blanks and numeric data from a file

So I have data in a text like this:
Alaska 200 500
New Jersey 400 300
.
.
And I am using ifstream to open it.
This is part of a course assignment. We are not allowed to read in the whole line all at once and parse it into the various pieces. So trying to figure out how to read each part of every line.
Using >> will only read in "New" for "New Jersey" due to the white space/blank in the middle of that state name. Have tried a number of different things like .get(), .read(), .getline(). I have not been able to get the whole state name read in, and then read in the remainder of the numeric data for a given line.
I am wondering whether it is possible to read the whole line directly into a structure. Of course, structure is a new thing we are learning...
Any suggestions?
Can't you just read the state name in a loop?
Read a string from cin: if the first character of the string is numeric then you've reached the next field and you can exit the loop. Otherwise just append it to the state name and loop again.
Here is a line by line parsing solution that doesn't use any c-style parsing methods:
std::string line;
while (getline(ss, line) && !line.empty()) {
size_t startOfNumbers = line.find_first_of("0123456789");
size_t endOfName = line.find_last_not_of(" ", startOfNumbers);
std::string name = line.substr(0, endOfName); // Extract name
std::stringstream nums(line.substr(startOfNumbers)); // Get rest of the line
int num1, num2;
nums >> num1 >> num2; // Read numbers
std::cout << name << " " << num1 << " " << num2 << std::endl;
}
If you can't use getline, do it yourself: Read and store in a buffer until you find '\n'. In this case you probably also cannot use all the groovy stuff in std::string and algorithm and might as well use good ol' C programming at that point.
Once you have grabbed a line, read your way backwards from the end of the line and
Discard all whitespace until you find non whitespace.
Gather characters found into token 3 until you find whitepace again.
Read and discard the whitespace until you find the end of token 2.
Gather token 2 until you find more whitespace.
Discard the whitespace until you find the end of token 1. The rest of the line is all token 1.
convert token 2 and token 3 into numbers. I like to use strtol for this.
You can build all of the above or Daniel's answer (use his answer if at all possible) into an overload of operator>>. This lets you
mystruct temp;
while (filein >> temp)
{
// do something with temp. Stick it in a vector, whatever
}
The code to do this looks something like (Stealing wholesale from What are the basic rules and idioms for operator overloading? <-- Read this. It could save your life one day)
std::istream& operator>>(std::istream& is, mystruct & obj)
{
// read obj from stream
if( /* no valid object of T found in stream */ )
is.setstate(std::ios::failbit);
return is;
}
Here's another example of reading the file word by word. Edited to remove the example using the eof check as the while loop condition. Also included a struct as you mentioned that's what you just learned. I'm not sure how you're supposed to use your struct, so I just made it simple and had it contain 3 variables, a string, and 2 ints. To verify it reads correctly it couts the contents of the struct variables after its read in which includes printing out "New Jersey" as one word.
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h> // for atoi
using namespace std;
// Not sure how you're supposed to use the struct you mentioned. But for this example it'll just contain 3 variables to store the data read in from each line
struct tempVariables
{
std::string state;
int number1;
int number2;
};
// This will read the set of characters and return true if its a number, or false if its just string text
bool is_number(const std::string& s)
{
return !s.empty() && s.find_first_not_of("0123456789") == std::string::npos;
}
int main()
{
tempVariables temp;
ifstream file;
file.open("readme.txt");
std::string word;
std::string state;
bool stateComplete = false;
bool num1Read = false;
bool num2Read = false;
if(file.is_open())
{
while (file >> word)
{
// Check if text read in is a number or not
if(is_number(word))
{
// Here set the word (which is the number) to an int that is part of your struct
if(!num1Read)
{
// if code gets here we know it finished reading the "string text" of the line
stateComplete = true;
temp.number1 = atoi(word.c_str());
num1Read = true; // won't read the next text in to number1 var until after it reads a state again on next line
}
else if(!num2Read)
{
temp.number2 = atoi(word.c_str());
num2Read = true; // won't read the next text in to number2 var until after it reads a state agaon on next line
}
}
else
{
// reads in the state text
temp.state = temp.state + word + " ";
}
if(stateComplete)
{
cout<<"State is: " << temp.state <<endl;
temp.state = "";
stateComplete = false;
}
if(num1Read && num2Read)
{
cout<<"num 1: "<<temp.number1<<endl;
cout<<"num 2: "<<temp.number2<<endl;
num1Read = false;
num2Read = false;
}
}
}
return 0;
}

Why doesn't this code correctly read strings until encountering a newline?

I'm trying to write a program that reads a bunch of strings from the user, then a newline, and pushes all the strings I've read onto a stack. Here's what I have so far:
stack<string> st;
string str;
while(str != "\n")
{
cin >> str;
st.push(str);
}
However, this goes into an infinite loop and doesn't stop when I read a newline. Why is this happening? How do I fix it?
By default, the stream extraction operator (the >> operator) as applied to strings will skip over all whitespace. If you type in A B C, then a newline, then D E F, then try reading strings one at a time using the stream extraction operator, you'll get the strings "A", "B", "C", "D", "E", and "F" with no whitespace and no newlines.
If you want to read a bunch of strings until you hit a newline, you can consider using std::getline to read a line of text, then use an std::istringstream to tokenize it:
#include <sstream>
/* Read a full line from the user. */
std::string line;
if (!getline(std::cin, line)) {
// Handle an error
}
/* Tokenize it. */
std::istringstream tokenizer(line);
for (std::string token; tokenizer >> token; ) {
// Do something with the string token
}
As a note - in your original code, you have a loop that generally looks like this:
string toRead;
while (allIsGoodFor(toRead)) {
cin >> toRead;
// do something with toRead;
}
This approach, in general, doesn't work because it will continue through the loop one time too many. Specifically, once you read an input that causes the condition to be false, the loop will keep processing what you've read so far. It's probably a better idea to do something like this:
while (cin >> toRead && allIsGoodFor(toRead)) {
do something with toRead;
}
Try doing
stack<string> st;
string str;
while(str!="\n")
{
cin>>str;
if(str == "\n")
{
break;
}
st.push(str);
}
And see if that works.
And if not, then try
while ((str = cin.get()) != '\n')
instead of
while(str!="\n")

c++ string manipulation reversal

I am currently doing c++ and am going through how to take in an sentence through a string and reverse the words (This is a word......word a is This etc)
I have looked at this method:
static string reverseWords(string const& instr)
{
istringstream iss(instr);
string outstr;
string word;
iss >> outstr;
while (iss >> word)
{
outstr = word + ' ' + outstr;
}
return outstr;
}
int main()
{
string s;
cout << "Enter sentence: ";
getline(cin, s);
string sret = reverseWords(s);
cout << reverseWords(s) << endl;
return 0;
}
I have gone through the function and kind of understand but I am a bit confused as to EXACTLY what is going on at
iss >> outstr;
while (iss >> word)
{
outstr = word + ' ' + outstr;
}
return outstr;
Can anybody explain to me the exact process that is happening that enables the words to get reversed?
Thank you very much
iss is an istringstream, and istringstreams are istreams.
As an istream, iss has the operator>>, which reads into strings from its string buffer in a whitespace delimeted manner. That is to say, it reads one whitespace separated token at a time.
So, given the string "This is a word", the first thing it would read is "This". The next thing it would read would be "is", then "a", then "word". Then it would fail. If it fails, that puts iss into a state such that, if you test it as a bool, it evaluates as false.
So the while loop will read one word at a time. If the read succeeds, then the body of the loop appends the word to the beginning of outstr. If it fails, the loop ends.
iss is a stream, and the >> is the extraction operator. If you look upon the stream as a continuous line of data, the extraction operator removes some data from this stream.
The while loop keep extracting words from the stream until it is empty (or as long as the stream is good one might say). The inside of the loop is used to add the newly extracted word to the end of the outstr
Look up information about c++ streams to learn more.
The instruction:
istringstream iss(instr);
allows instr to be parsed when the operator>> is used, separating words thourgh a whitespace character. Each time the operator >> is used it makes iss point to the next word of the phrase stored by instr.
iss >> outstr; // gets the very first word of the phrase
while (iss >> word) // loop to get the rest of the words, one by one
{
outstr = word + ' ' + outstr; // and store the most recent word before the previous one, therefore reversing the string!
}
return outstr;
So the first word retrieved in the phrase is actually stored in the last position of the output string. And then all the subsequent words read from the original string will be put before the previous word read.