Stringstream. Detect end of line - c++

Is there way to detect end of line in stringstream?
My file:
1/2
2/3
3/4
4/5
Something like that is not working:
stringstream buffer;
buffer << file.rdbuf();
string str;
getline(buffer, str);
...
istringstream ss(str);
int num;
ss >> num;
if (ss.peek() == '/') //WORKS AS EXPECTED!
{...}
if(ss.peek() == '\n') //NOT WORKING! SKIPS THIS CONDITION.
{...}
This is was warned:
if(ss.telg() == -1) //WARNED!
~~~~~
{...}

std::istringstream has an eof() method:
Returns true if the associated stream has reached end-of-file. Specifically, returns true if eofbit is set in rdstate().
string str;
istringstream ss(str);
int num;
ss >> num;
if (ss.eof()) {...}

You could always use find_first_of:
std::string str_contents = buffer.str();
if(str_contents.find_first_of('\n') != std::string::npos) {
//contains EOL
}
find_first_of('\n') returns the first instance of the EOL character. If there are none, then it returns (a very large index) std::string::npos. If you know that there is a EOL character in your string, you can get the the first line using
std::string str;
std::getline(buffer, str);
Also see NathanOliver's Answer

Related

Stringstream Delimiter

Is there a default delimiter for stringstream? From my research, I understood that I can use it to split a string using space and comma as delimiters. But can I use other delimiters for stringstream?
Here is a C++ code snippet :
vector<int> parseInts(string str) {
// Complete this function
stringstream ss(str);
vector<int> res;
char ch;
int x;
while(ss){
ss >> x >> ch;
res.push_back(x);
}
return res;
}
This code works without me mentioning any specific delimiter. How does that happen?
There is no "delimiter" for streams at all. operator>>, on the other hand, implements its reading by delimiting on whitespace characters. For other delimiter characters, you can use std::getline() instead, eg:
vector<int> parseInts(string str) {
// Complete this function
istringstream iss(str);
vector<int> res;
int x;
string temp;
char delim = '-'; // whatever you want
while (getline(iss, temp, delim)) {
if (istringstream(temp) >> x) { // or std::stoi(), std::strtol(), etc
res.push_back(x);
}
}
return res;
}
This code works without me mentioning any specific delimiter. How does that happen?
streams don't know anything about delimiters. What is happening is that, on each loop iteration, you are calling ss >> x to read the next available non-whitespace substring and convert it to an integer, and then you are calling ss >> ch to read the next available non-whitespace character following that integer. The code doesn't care what that character actually is, as long as it is not whitespace. Your loop runs until it reaches the end of the stream, or encounters a reading/conversion error.

Why doesn't this code correctly read strings until encountering a newline?

I'm trying to write a program that reads a bunch of strings from the user, then a newline, and pushes all the strings I've read onto a stack. Here's what I have so far:
stack<string> st;
string str;
while(str != "\n")
{
cin >> str;
st.push(str);
}
However, this goes into an infinite loop and doesn't stop when I read a newline. Why is this happening? How do I fix it?
By default, the stream extraction operator (the >> operator) as applied to strings will skip over all whitespace. If you type in A B C, then a newline, then D E F, then try reading strings one at a time using the stream extraction operator, you'll get the strings "A", "B", "C", "D", "E", and "F" with no whitespace and no newlines.
If you want to read a bunch of strings until you hit a newline, you can consider using std::getline to read a line of text, then use an std::istringstream to tokenize it:
#include <sstream>
/* Read a full line from the user. */
std::string line;
if (!getline(std::cin, line)) {
// Handle an error
}
/* Tokenize it. */
std::istringstream tokenizer(line);
for (std::string token; tokenizer >> token; ) {
// Do something with the string token
}
As a note - in your original code, you have a loop that generally looks like this:
string toRead;
while (allIsGoodFor(toRead)) {
cin >> toRead;
// do something with toRead;
}
This approach, in general, doesn't work because it will continue through the loop one time too many. Specifically, once you read an input that causes the condition to be false, the loop will keep processing what you've read so far. It's probably a better idea to do something like this:
while (cin >> toRead && allIsGoodFor(toRead)) {
do something with toRead;
}
Try doing
stack<string> st;
string str;
while(str!="\n")
{
cin>>str;
if(str == "\n")
{
break;
}
st.push(str);
}
And see if that works.
And if not, then try
while ((str = cin.get()) != '\n')
instead of
while(str!="\n")

Using erase() in a while loop and segfault C++

Okay, so I'm having a bit of a problem here. The thing is this code works on a friend's computer but I'm getting segmentation faults when I try to run it.
I am reading a file looking like so:
word 2 wor ord
anotherword 7 ano oth the her erw wor ord
...
And I want to parse every word of the file. The first two words (e.g. word and 2) are to be erased but saving the first one in another variable in the process.
I've looked around a bit on accomplishing this, and I've come up with this half-assed piece of code that seems to work on my friends' computer but not mine.
Dictionary::Dictionary() {
ifstream ip;
ip.open("words.txt", ifstream::in);
string input;
string buf;
vector<string> tokens; // Holds words
while(getline(ip, input)){
if(input != " ") {
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd = tokens.at(0);
tokens.erase(tokens.begin()); // Remove the word from the vector
tokens.erase(tokens.begin()); // Remove the number indicating trigrams
Word curr(werd, tokens);
words[werd.length()].push_back(curr); // Put the word at the vector with word length i.
tokens.clear();
}
}
ip.close();
}
What's the best of of parsing this kind of structure in a file and removing the first two elements but saving the others? As you can see, I'm making a Word object that contains a string and a vector for later use.
Regards
EDIT; It seems to add the first line fine, but on removal of the second element, it crashes with a segmentation fault error.
EDIT; words.txt contain this:
addict 4 add ddi dic ict
sinister 6 ini ist nis sin ste ter
test 2 est tes
cplusplus 7 cpl lus lus plu plu spl usp
Without leading blank spaces or ending blanks. Not that it reads all the way anyway.
Word.cc:
#include <string>
#include <vector>
#include <algorithm>
#include "word.h"
using namespace std;
Word::Word(const string& w, const vector<string>& t) : word(w), trigrams(t) {}
string Word::get_word() const {
return word;
}
unsigned int Word::get_matches(const vector<string>& t) const {
vector<string> sharedTrigrams;
set_intersection(t.begin(),t.end(), trigrams.begin(), trigrams.end(), back_inserter(sharedTrigrams));
return sharedTrigrams.size();
}
First of all, there is error in the number of closing }s in your posted code. If you indent them properly, you will see that your code is:
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
}
string werd = tokens.at(0);
tokens.erase(tokens.begin());
tokens.erase(tokens.begin());
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
Assuming that is a small typo in posting, the other problem is that tokens is an empty list when input == " " yet you continue to use tokens as though it has 2 or more items in it.
You can fix that by moving everything inside the if statement.
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd = tokens.at(0);
tokens.erase(tokens.begin());
tokens.erase(tokens.begin());
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
I would add further checks to make it more robust.
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd;
if ( !tokens.empty() )
{
werd = tokens.at(0);
tokens.erase(tokens.begin());
}
if ( !tokens.empty() )
{
tokens.erase(tokens.begin());
}
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
You forgot to include the initialization of the variable "words" in your code. Just looking at it, I am guessing you are initializing "words" to be a fixed-length array of vectors, but then read a word that is off the end of the array. Bang, you're dead. Add a check to "werd.length()" to ensure it is strictly less than the length of "words".
ifstream ip;
ip.open("words.txt", ifstream::in);
string input;
while(getline(ip, input)){
istringstream iss(input);
string str;
unsigned int count = 0;
if(iss >> str >> count) {
vector<string> tokens { istream_iterator<string>(iss), istream_iterator<string>() }; // Holds words
if(tokens.size() == count)
words[str.length()].emplace_back(str, tokens);
}
}
ip.close();
This is what I used to make it work.

A real solution to the 'cin' and 'getline' issue

How do I get rid of the leading ' ' and '\n' symbols when I'm not sure I'll get a cin, before the getline?
Example:
int a;
char s[1001];
if(rand() == 1){
cin >> a;
}
cin.getline(s);
If I put a cin.ignore() before the getline, I may lose the first symbol of the string, so is my only option to put it after every use of 'cin >>' ? Because that's not very efficient way to do it when you are working on a big project.
Is there a better way than this:
int a;
string s;
if(rand() == 1){
cin >> a;
}
do getline(cin, s); while(s == "");
Like this:
std::string line, maybe_an_int;
if (rand() == 1)
{
if (!(std::getline(std::cin, maybe_an_int))
{
std::exit(EXIT_FAILURE);
}
}
if (!(std::getline(std::cin, line))
{
std::exit(EXIT_FAILURE);
}
int a = std::stoi(maybe_an_int); // this may throw an exception
You can parse the string maybe_an_int in several different ways. You could also use std::strtol, or a string stream (under the same condition as the first if block):
std::istringstream iss(maybe_an_int);
int a;
if (!(iss >> a >> std::ws) || iss.get() != EOF)
{
std::exit(EXIT_FAILURE);
}
You could of course handle parsing errors more gracefully, e.g. by running the entire thing in a loop until the user inputs valid data.
Both the space character and the newline character are classified as whitespace by standard IOStreams. If you are mixing formatted I/O with unformatted I/O and you need to clear the stream of residual whitespace, use the std::ws manipulator:
if (std::getline(std::cin >> std::ws, s) {
}

Tokenize stringstream based on type

I have an input stream containing integers and special meaning characters '#'. It looks as follows:
... 12 18 16 # 22 24 26 15 # 17 # 32 35 33 ...
The tokens are separated by space. There's no pattern for the position of '#'.
I was trying to tokenize the input stream like this:
int value;
std::ifstream input("data");
if (input.good()) {
string line;
while(getline(data, line) != EOF) {
if (!line.empty()) {
sstream ss(line);
while (ss >> value) {
//process value ...
}
}
}
}
The problem with this code is that the processing stops when the first '#' is encountered.
The only solution I can think of is to extract each individual token into a string (not '#') and use atoi() function to convert the string to an integer. However, it's very inefficient as the majority tokens are integer. Calling atoi() on the tokens introduces big overhead.
Is there a way I can parse the individual token by its type? ie, for integers, parse it as integers while for '#', skip it. Thanks!
One possibility would be to explicitly skip whitespace (ss >> std::ws), and then to use ss.peek() to find out if a # follows. If yes, use ss.get() to read it and continue, otherwise use ss >> value to read the value.
If the positions of # don't matter, you could also remove all '#' from the line before initializing the stringstream with it.
Usually not worth testing against good()
if (input.good()) {
Unless your next operation is generating an error message or exception. If it is not good all further operations will fail anyway.
Don't test against EOF.
while(getline(data, line) != EOF) {
The result of std::getline() is not an integer. It is a reference to the input stream. The input stream is convertible to a bool like object that can be used in bool a context (like while if etc..). So what you want to do:
while(getline(data, line)) {
I am not sure I would read a line. You could just read a word (since the input is space separated). Using the >> operator on string
std::string word;
while(data >> word) { // reads one space separated word
Now you can test the word to see if it is your special character:
if (word[0] == "#")
If not convert the word into a number.
This is what I would do:
// define a class that will read either value from a stream
class MyValue
{
public:
bool isSpec() const {return isSpecial;}
int value() const {return intValue;}
friend std::istream& operator>>(std::istream& stream, MyValue& data)
{
std::string item;
stream >> item;
if (item[0] == '#') {
data.isSpecial = true;
} else
{ data.isSpecial = false;
data.intValue = atoi(&item[0]);
}
return stream;
}
private:
bool isSpecial;
int intValue;
};
// Now your loop becomes:
MyValue val;
while(file >> val)
{
if (val.isSpec()) { /* Special processing */ }
else { /* We have an integer */ }
}
Maybe you can read all values as std::string and then check if it's "#" or not (and if not - convert to int)
int value;
std::ifstream input("data");
if (input.good()) {
string line;
std::sstream ss(std::stringstream::in | std::stringstream::out);
std::sstream ss2(std::stringstream::in | std::stringstream::out);
while(getline(data, line, '#') {
ss << line;
while(getline(ss, line, ' ') {
ss2 << line;
ss2 >> value
//process values ...
ss2.str("");
}
ss.str("");
}
}
In here we first split the line by the token '#' in the first while loop then in the second while loop we split the line by ' '.
Personally, if your separator is always going to be space regardless of what follows, I'd recommend you just take the input as string and parse from there. That way, you can take the string, see if it's a number or a # and whatnot.
I think you should re-examine your premise that "Calling atoi() on the tokens introduces big overhead-"
There is no magic to std::cin >> val. Under the hood, it ends up calling (something very similar to) atoi.
If your tokens are huge, there might be some overhead to creating a std::string but as you say, the vast majority are numbers (and the rest are #'s) so they should mostly be short.