A std::getline with list of stopping characters - c++

I'd like to have a std::getline function which is able to stop if it encounters any of the characters listed in a string, so I came up with the following:
std::istream& read_until(std::istream& is, std::string& s, const std::string& list) {
s.clear();
while (is.peek() && is && list.find(is.peek()) == list.npos) {
s += is.get();
}
return is;
}
The fact that it leaves the terminating character on the stream is the desired behavior. This works, but it's ugly and doesn't feel the right way to go. I'd like to ask if you see any clear mistake or if you have a better way of handling this.

Related

Read only certain characters from inputstream (C++)

I have a class BitSet with a data member d_bits. d_bits is an object that has member functions readString and resize. Now I want to define the extraction operator (operator<<) for BitSet, which ignores leading whitespaces and then reads zeroes and ones from the istream and stops once a different character is encountered. I made the following cumbersome function:
istream &operator>>(istream &in, BitSet &bitSet){
while (in.peek() == ' ')
in.ignore();
string bits;
char ch;
while (in.peek() == '1' || in.peek() == '0'){
in.get(ch);
bits.append(1, ch);
}
bitSet.d_bits.resize(bits.length());
bitSet.d_bits.readString(bits);
return in;
}
It works correctly, but I am unhappy with it for several reasons. Ignoring whitespace characters one by one seems unnecessarily tedious. Also, once the whitespaces are ignored, reading the characters one by one and then appending them to a string also seems excessively slow. I looked around for the member functions of istreams that might be convenient but I could not find a better way of doing this. So is this the preferred way of extracting only a part of the contents of an istream?

while loop with comma operator verses duplicate code verses “break;”

After reading a great answer about the comma operator in C/C++ (What does the comma operator do - and I use the same example code), I wanted to know which is the most readable, maintainable, preferred way to implement a while loop. Specifically a while loop whose condition depends on an operation or calculation, and the condition might be false the first time (if the loop were to always pass at least once then the do-while would work fine).
Is the comma version the most preferred? (how about an answer for each, and the rest can vote by upvoting accordingly?)
Simple Implementation
This code has duplicate statements, that (most likely) must always be the same.
string s;
read_string(s); // first call to set up the condition
while(s.len() > 5) // might be false the first pass
{
//do something
read_string(s); // subsequent identical code to update the condition
}
Implementation using break
string s;
while(1) // this looks like trouble
{
read_string(s);
if(s.len() > 5) break; // hmmm, where else might this loop exit
//do something
}
Implementation using comma
string s;
while( read_string(s), s.len() > 5 )
{
//do something
}
I would say none of the above. I see a couple of options. The choice between them depends on your real constraints.
One possibility is that you have a string that should always have some minimum length. If that's the case, you can define
a class that embodies that requirement:
template <size_t min>
class MinString{
std::string data;
public:
friend std::istream &operator>>(std::istream &is, MinString &m) {
std::string s;
read_string(is, s); // rewrite read_string to take an istream & as a parameter
if (s.length() >= min)
m.data = s;
else
is.setstate(std::ios::failbit);
return is;
}
operator std::string() { return data; }
// depending on needs, maybe more here such as assignment operator
// and/or ctor that enforce the same minimum length requirement
};
This leads to code something like this:
Minstring<5> s;
while (infile >> s)
process(s);
Another possibility is that you have normal strings, but under some circumstances you need to do a read that must be at
least 5 characters. In this case the enforcement should be in a function rather than the type.
bool read_string_min(std::string &s, size_t min_len) {
read_string(s);
return s.length() >= min_len;
}
Again, with this the loop can be simple and clean:
while (read_string_min(s, 5))
process(s);
It's also possible to just write a function that returns the length that was read, and leave enforcement of the minimum
length in the while loop:
while (read_string(s) > 5)
process(s);
Some people like this on the idea that it fits the single responsibilty principle better. IMO, "read a string of at least 5 characters" qualifies perfectly well as a single responsibility, so it strikes me as a weak argument at best though (but even this design still makes it easy to write the code cleanly).
Summary: anything that does input should either implicitly or explicitly provide some way of validating that it read the input correctly. Something that just attempts to read some input but provides no indication of success/failure is simply a poor design (and it's that apparent failure in the design of your read_string that's leading to the problem you've encountered).
There is a fourth option that seems better to me:
string s;
while( read_string(s) && s.len() > 5 )
{
//do something
}

Reading a txt file

For my project I have to overwrite the operator>> method to read in an array of numbers from a text file. This is my first time doing any of this and I am pretty lost. My code so far looks like this.
std::istream& operator>>(std::istream& in, bigint array){
bool semi = false ;
while(!semi){
if(get() = ';')
semi = true ;
in <<get();
}
return in ;
}
And the file looks like this.
10000000000000000000000000000000000345;
299793000000
00000000000000000000067;
4208574289572473098273498723475;
28375039287459832728745982734509872340985729384750928734590827098752938723;
99999999; 99999999;
Each new array stops when it hits a ";'. The white spaces and endlines are confusing me too. Any help would be appreciated thank you.
You will want to use
bigint& array
to take the value by reference (or you couldn't possibly insert the digits read into it).
Also, you will want to use
char ch;
in >> ch;
instead of in << get() (which doesn't compile). Better yet, add error handling:
if (!(in >> ch))
{
// we're in trouble (in.rdstate(), in.eof(), in.fail() or in.bad() to know more)
}
If you wanted to use in.get(), you should be prepared to skip your own whitespace (including newlines). I'd prefer std::istream_iterator here, because it will automatically do so (if the std::ios::skipws flag is in effect, which it is, by default).
So here's a simplist approach (that mostly assumes input data is valid and whitespace ignorable):
#include <vector>
#include <istream>
#include <iterator>
struct bigint
{
std::vector<char> v; // or whatever representation you use (binary? bcd?)
};
std::istream& operator>>(std::istream& in, bigint& array)
{
for (std::istream_iterator<char> f(in), l; f != l; ++f) {
if (*f>='0' && *f<='9')
array.v.push_back(*f - '0');
else if (*f==';')
break;
else
throw "invalid input";
}
return in;
}
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss(
"10000000000000000000000000000000000345;\n"
"299793000000\n"
"00000000000000000000067;\n"
"4208574289572473098273498723475;\n"
"28375039287459832728745982734509872340985729384750928734590827098752938723;\n"
"99999999; 99999999;\n");
bigint value;
while (value.v.clear(), iss >> value)
std::cout << "read " << value.v.size() << " digits\n";
}
See it Live on Coliru
There's quite a lot of confusions here. I'll just list some points, but you have a way to go even if you fix these things.
What exactly are you reading? You say you are reading an array of numbers, but your code says this
std::istream& operator>>(std::istream& in, bigint array){
I might be wrong but bigint sounds like a single number to me. I would expect something like this
std::istream& operator>>(std::istream& in, std::vector<bigint>& array){
Which brings me to the second point, operator>> is expected to modify it's second argument, which means it cannot be passed by value, you must use a reference. In other words this is wrong
std::istream& operator>>(std::istream& in, X x){
but this is OK
std::istream& operator>>(std::istream& in, X& x){
You are trying to read an array of bigints, so you need a loop (you have that) and each time round the loop you will read one bigint. So you need a way to read one bigint, do you have that? There nothing in your question or your code that indicates you have the ability to read a bigint, but it's obviously crucial that you do. So if you do not have any code yet to read a bigint, you can forget about this whole exercise until you have that code, so work on that first. When you can read one bigint, only then should you come back to the problem of reading an array of bigints.
The other tricky part is the stopping condition, you stop when you read a semi-colon (possibly preceded by whitespace). So you need a way to read the next non-space character, and crucially you need a way to unread it if it turns out not to be a semicolon. So you need something like this
if (!(in >> ch) || ch == ';')
{
// quit, either no more input, or the next char is a semicolon
break;
}
in.putback(ch); // unread the last char read
// read the next bigint
Hope this helps.

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}
The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek
That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.
Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/
There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.
No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?

Ways std::stringstream can set fail/bad bit?

A common piece of code I use for simple string splitting looks like this:
inline std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
Someone mentioned that this will silently "swallow" errors occurring in std::getline. And of course I agree that's the case. But it occurred to me, what could possibly go wrong here in practice that I would need to worry about. basically it all boils down to this:
inline std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
if(/* what error can I catch here? */) {
// *** How did we get here!? ***
}
return elems;
}
A stringstream is backed by a string, so we don't have to worry about any of the issues associated with reading from a file. There is no type conversion going on here since getline simply reads until it sees the line delimeter or EOF. So we can't get any of the errors that something like boost::lexical_cast has to worry about.
I simply can't think of something besides failing to allocate enough memory that could go wrong, but that'll just throw a std::bad_alloc well before the std::getline even takes place. What am I missing?
I can't imagine what errors this person thinks might happen, and you should ask them to explain. Nothing can go wrong except allocation errors, as you mentioned, which are thrown and not swallowed.
The only thing I see that you're directly missing is that ss.fail() is guaranteed to be true after the while loop, because that's the condition being tested. (bool(stream) is equivalent to !stream.fail(), not stream.good().) As expected, ss.eof() will also be true, indicating the failure was due to EOF.
However, there might be some confusion over what is actually happening. Because getline uses delim-terminated fields rather than delim-separated fields, input data such as "a\nb\n" has two instead of three fields, and this might be surprising. For lines this makes complete sense (and is POSIX standard), but how many fields, with a delim of '-', would you expect to find in "a-b-" after splitting?
Incidentally, here's how I'd write split:
template<class OutIter>
OutIter split(std::string const& s, char delim, OutIter dest) {
std::string::size_type begin = 0, end;
while ((end = s.find(delim, begin)) != s.npos) {
*dest++ = s.substr(begin, end - begin);
begin = end + 1;
}
*dest++ = s.substr(begin);
return dest;
}
This avoids all of the problems with iostreams in the first place, avoids extra copies (the stringstream's backing string; plus the temp returned by substr can even use a C++0x rvalue reference for move semantics if supported, as written), has the behavior I expect from split (different from yours), and works with any container.
deque<string> c;
split("a-b-", '-', back_inserter(c));
// c == {"a", "b", ""}