while loop with comma operator verses duplicate code verses “break;” - c++

After reading a great answer about the comma operator in C/C++ (What does the comma operator do - and I use the same example code), I wanted to know which is the most readable, maintainable, preferred way to implement a while loop. Specifically a while loop whose condition depends on an operation or calculation, and the condition might be false the first time (if the loop were to always pass at least once then the do-while would work fine).
Is the comma version the most preferred? (how about an answer for each, and the rest can vote by upvoting accordingly?)
Simple Implementation
This code has duplicate statements, that (most likely) must always be the same.
string s;
read_string(s); // first call to set up the condition
while(s.len() > 5) // might be false the first pass
{
//do something
read_string(s); // subsequent identical code to update the condition
}
Implementation using break
string s;
while(1) // this looks like trouble
{
read_string(s);
if(s.len() > 5) break; // hmmm, where else might this loop exit
//do something
}
Implementation using comma
string s;
while( read_string(s), s.len() > 5 )
{
//do something
}

I would say none of the above. I see a couple of options. The choice between them depends on your real constraints.
One possibility is that you have a string that should always have some minimum length. If that's the case, you can define
a class that embodies that requirement:
template <size_t min>
class MinString{
std::string data;
public:
friend std::istream &operator>>(std::istream &is, MinString &m) {
std::string s;
read_string(is, s); // rewrite read_string to take an istream & as a parameter
if (s.length() >= min)
m.data = s;
else
is.setstate(std::ios::failbit);
return is;
}
operator std::string() { return data; }
// depending on needs, maybe more here such as assignment operator
// and/or ctor that enforce the same minimum length requirement
};
This leads to code something like this:
Minstring<5> s;
while (infile >> s)
process(s);
Another possibility is that you have normal strings, but under some circumstances you need to do a read that must be at
least 5 characters. In this case the enforcement should be in a function rather than the type.
bool read_string_min(std::string &s, size_t min_len) {
read_string(s);
return s.length() >= min_len;
}
Again, with this the loop can be simple and clean:
while (read_string_min(s, 5))
process(s);
It's also possible to just write a function that returns the length that was read, and leave enforcement of the minimum
length in the while loop:
while (read_string(s) > 5)
process(s);
Some people like this on the idea that it fits the single responsibilty principle better. IMO, "read a string of at least 5 characters" qualifies perfectly well as a single responsibility, so it strikes me as a weak argument at best though (but even this design still makes it easy to write the code cleanly).
Summary: anything that does input should either implicitly or explicitly provide some way of validating that it read the input correctly. Something that just attempts to read some input but provides no indication of success/failure is simply a poor design (and it's that apparent failure in the design of your read_string that's leading to the problem you've encountered).

There is a fourth option that seems better to me:
string s;
while( read_string(s) && s.len() > 5 )
{
//do something
}

Related

Using templates for implementing a generic string parser

I am trying to come up with a generic solution for parsing strings (with a given format). For instance, I would like to be able to parse a string containing a list of numeric values (integers or floats) and return a std::vector. This is what I have so far:
template<typename T, typename U>
T parse_value(const U& u) {
throw std::runtime_error("no parser available");
}
template<typename T>
std::vector<T> parse_value(const std::string& s) {
std::vector<std::string> parts;
boost::split(parts, s, boost::is_any_of(","));
std::vector<T> res;
std::transform(parts.begin(), parts.end(), std::back_inserter(res),
[](const std::string& s) { return boost::lexical_cast<T>(s); });
return res;
}
Additionally, I would like to be able to parse strings containing other type of values. For instance:
struct Foo { /* ... */ };
template<>
Foo parse_value(const std::string& s) {
/* parse string and return a Foo object */
}
The reason to maintain a single "hierarchy" of parse_value functions is because, sometimes, I want to parse an optional value (which may exist or not), using boost::optional. Ideally, I would like to have just a single parse_optional_value function that would delegate on the corresponding parse_value function:
template<typename T>
boost::optional<T> parse_optional_value(const boost::optional<std::string>& s) {
if (!s) return boost::optional<T>();
return boost::optional<T>(parse_value<T>(*s));
}
So far, my current solution does not work (the compiler cannot deduce the exact function to use). I guess the problem is that my solution relies on deducing the template value based on the return type of parse_value functions. I am not really sure how to fix this (or even whether it is possible to fix it, since the design approach could just be totally flawed). Does anyone know a way to solve what I am trying to do? I would really appreciate if you could just point me to a possible way to address the issues that I am having with my current implementation. BTW, I am definitely open to completely different ideas for solving this problem too.
You cannot overload functions based on return value [1]. This is precisely why the standard IO library uses the construct:
std::cin >> a >> b;
which may not be your piece of cake -- many people don't like it, and it is truly not without its problems -- but it does a nice job of providing a target type to the parser. It also has the advantage over a static parse<X>(const std::string&) prototype that it allows for chaining and streaming, as above. Sometimes that's not needed, but in many parsing contexts it is essential, and the use of operator>> is actually a pretty cool syntax. [2]
The standard library doesn't do what would be far and away the coolest thing, which is to skip string constants scanf style and allow interleaved reading.
vector<int> integers;
std::cin >> "[" >> interleave(integers, ",") >> "]";
However, that could be defined. (Possibly it would be better to use an explicit wrapper around the string literals, but actually I prefer it like that; but if you were passing a variable you'd want to use a wrapper).
[1] With the new auto declaration, the reason for this becomes even clearer.
[2] IO manipulators, on the other hand, are a cruel joke. And error handling is pathetic. But you can't have everything.
Here is an example of libsass parser:
const char* interpolant(const char* src) {
return recursive_scopes< exactly<hash_lbrace>, exactly<rbrace> >(src);
}
// Match a single character literal.
// Regex equivalent: /(?:x)/
template <char chr>
const char* exactly(const char* src) {
return *src == chr ? src + 1 : 0;
}
where rules could be passed into the lex method.

Reading a single character from an fstream?

I'm trying to move from stdio to iostream, which is proving very difficult. I've got the basics of loading a file and closing them, but I really don't have a clue as to what a stream even is yet, or how they work.
In stdio everything's relatively easy and straight forward compared to this. What I need to be able to do is
Read a single character from a text file.
Call a function based on what that character is.
Repeat till I've read all the characters in the file.
What I have so far is.. not much:
int main()
{
std::ifstream("sometextfile.txt", std::ios::in);
// this is SUPPOSED to be the while loop for reading. I got here and realized I have
//no idea how to even read a file
while()
{
}
return 0;
}
What I need to know is how to get a single character and how that character is actually stored(Is it a string? An int? A char? Can I decide for myself how to store it?)
Once I know that I think I can handle the rest. I'll store the character in an appropriate container, then use a switch to do things based on what that character actually is. It'd look something like this.
int main()
{
std::ifstream textFile("sometextfile.txt", std::ios::in);
while(..able to read?)
{
char/int/string readItem;
//this is where the fstream would get the character and I assume stick it into readItem?
switch(readItem)
{
case 1:
//dosomething
break;
case ' ':
//dosomething etc etc
break;
case '\n':
}
}
return 0;
}
Notice that I need to be able to check for white space and new lines, hopefully it's possible. It would also be handy if instead of one generic container I could store numbers in an int and chars in a char. I can work around it if not though.
Thanks to anyone who can explain to me how streams work and what all is possible with them.
You also can abstract away the whole idea of getting a single character with streambuf_iterators, if you want to use any algorithms:
#include <iterator>
#include <fstream>
int main(){
typedef std::istreambuf_iterator<char> buf_iter;
std::fstream file("name");
for(buf_iter i(file), e; i != e; ++i){
char c = *i;
}
}
You can also use standard for_each algorithm:
#include <iterator>
#include <algorithm>
#include <fstream>
void handleChar(const char& c)
{
switch (c) {
case 'a': // do something
break;
case 'b': // do something else
break;
// etc.
}
}
int main()
{
std::ifstream file("file.txt");
if (file)
std::for_each(std::istream_iterator<char>(file),
std::istream_iterator<char>(),
handleChar);
else {
// couldn't open the file
}
}
istream_iterator skips whitespace characters. If those are meaningful in your file use istreambuf_iterator instead.
This has already been answered but whatever.
You can use the comma operator to create a loop which behaves like a for each loop which goes through the entire file reads every character one by one and stop when it's done.
char c;
while((file.get(c), file.eof()) == false){
/*Your switch statement with c*/
}
Explanation:
The first part of the expression in the for loop (file.get(c), file.eof())
will function as follows. Firstly file.get(c) gets executed which reads a character and stores the result in c. Then, due to the comma operator, the return value is discarded and file.eof() gets executed which returns a bool whether or not the end of the file has been reached. This value is then compared.
Side Note:
ifstream::get() always reads the next character. Which means calling it twice would read the first two character in the file.
fstream::get
Next time you have similar problem go to cplusplusreference or similar site, locate class you have problem with and read description of every method. Normally, this solves the problem. Googling also works.
I would honestly just avoid iterators here since it's just hurting readability. Instead, consider:
int main()
{
std::ifstream file("sometextfile.txt")
char c;
while(file >> c) {
// do something with c
}
// file reached EOF
return 0;
}
This works because the stream implements operator bool, which makes it implicitly convertible to true if the stream hasn't reached EOF, and false if it has; and because file >> c returns the file itself, it can be used as the while condition.
Using an iterator is only really useful if you intend to use other functions from , but for plain reading, using the stream operator is simpler and easier to read.
while (textFile.good()) {
char a;
textFile.get(a);
switch(a)
{
case 1:
//dosomething
break;
case ' ':
//dosomething etc etc
break;
case '\n':
}
}

Bind temporary to non-const reference

Rationale
I try to avoid assignments in C++ code completely. That is, I use only initialisations and declare local variables as const whenever possible (i.e. always except for loop variables or accumulators).
Now, I’ve found a case where this doesn’t work. I believe this is a general pattern but in particular it arises in the following situation:
Problem Description
Let’s say I have a program that loads the contents of an input file into a string. You can either call the tool by providing a filename (tool filename) or by using the standard input stream (cat filename | tool). Now, how do I initialise the string?
The following doesn’t work:
bool const use_stdin = argc == 1;
std::string const input = slurp(use_stdin ? static_cast<std::istream&>(std::cin)
: std::ifstream(argv[1]));
Why doesn’t this work? Because the prototype of slurp needs to look as follows:
std::string slurp(std::istream&);
That is, the argument i non-const and as a consequence I cannot bind it to a temporary. There doesn’t seem to be a way around this using a separate variable either.
Ugly Workaround
At the moment, I use the following solution:
std::string input;
if (use_stdin)
input = slurp(std::cin);
else {
std::ifstream in(argv[1]);
input = slurp(in);
}
But this is rubbing me the wrong way. First of all it’s more code (in SLOCs) but it’s also using an if instead of the (here) more logical conditional expression, and it’s using assignment after declaration which I want to avoid.
Is there a good way to avoid this indirect style of initialisation? The problem can likely be generalised to all cases where you need to mutate a temporary object. Aren’t streams in a way ill-designed to cope with such cases (a const stream makes no sense, and yet working on a temporary stream does make sense)?
Why not simply overload slurp?
std::string slurp(char const* filename) {
std::ifstream in(filename);
return slurp(in);
}
int main(int argc, char* argv[]) {
bool const use_stdin = argc == 1;
std::string const input = use_stdin ? slurp(std::cin) : slurp(argv[1]);
}
It is a general solution with the conditional operator.
The solution with the if is more or less the standard solution when
dealing with argv:
if ( argc == 1 ) {
process( std::cin );
} else {
for ( int i = 1; i != argc; ++ i ) {
std::ifstream in( argv[i] );
if ( in.is_open() ) {
process( in );
} else {
std::cerr << "cannot open " << argv[i] << std::endl;
}
}
This doesn't handle your case, however, since your primary concern is to
obtain a string, not to "process" the filename args.
In my own code, I use a MultiFileInputStream that I've written, which
takes a list of filenames in the constructor, and only returns EOF when
the last has been read: if the list is empty, it reads std::cin. This
provides an elegant and simple solution to your problem:
MultiFileInputStream in(
std::vector<std::string>( argv + 1, argv + argc ) );
std::string const input = slurp( in );
This class is worth writing, as it is generally useful if you often
write Unix-like utility programs. It is definitly not trivial, however,
and may be a lot of work if this is a one-time need.
A more general solution is based on the fact that you can call a
non-const member function on a temporary, and the fact that most of the
member functions of std::istream return a std::istream&—a
non const-reference which will then bind to a non const reference. So
you can always write something like:
std::string const input = slurp(
use_stdin
? std::cin.ignore( 0 )
: std::ifstream( argv[1] ).ignore( 0 ) );
I'd consider this a bit of a hack, however, and it has the more general
problem that you can't check whether the open (called by the constructor
of std::ifstream worked.
More generally, although I understand what you're trying to achieve, I
think you'll find that IO will almost always represent an exception.
You can't read an int without having defined it first, and you can't
read a line without having defined the std::string first. I agree
that it's not as elegant as it could be, but then, code which correctly
handles errors is rarely as elegant as one might like. (One solution
here would be to derive from std::ifstream to throw an exception if
the open didn't work; all you'd need is a constructor which checked for
is_open() in the constructor body.)
All SSA-style languages need to have phi nodes to be usable, realistically. You would run into the same problem in any case where you need to construct from two different types depending on the value of the condition. The ternary operator cannot handle such cases. Of course, in C++11 there are other tricks, like moving the stream or suchlike, or using a lambda, and the design of IOstreams is virtually the exact antithesis of what you're trying to do, so in my opinion, you would just have to make an exception.
Another option might be an intermediate variable to hold the stream:
std::istream&& is = argc==1? std::move(cin) : std::ifstream(argv[1]);
std::string const input = slurp(is);
Taking advantage of the fact that named rvalue references are lvalues.

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}
The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek
That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.
Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/
There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.
No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?

Example of overloading C++ extraction operator >> to parse data

I am looking for a good example of how to overload the stream input operator (operator>>) to parse some data with simple text formatting. I have read this tutorial but I would like to do something a bit more advanced. In my case I have fixed strings that I would like to check for (and ignore). Supposing the 2D point format from the link were more like
Point{0.3 =>
0.4 }
where the intended effect is to parse out the numbers 0.3 and 0.4. (Yes, this is an awfully silly syntax, but it incorporates several ideas I need). Mostly I just want to see how to properly check for the presence of fixed strings, ignore whitespace, etc.
Update:
Oops, the comment I made below has no formatting (this is my first time using this site).
I found that whitespace can be skipped with something like
std::cin >> std::ws;
And for eating up strings I have
static bool match_string(std::istream &is, const char *str){
size_t nstr = strlen(str);
while(nstr){
if(is.peek() == *str){
is.ignore(1);
++str;
--nstr;
}else{
is.setstate(is.rdstate() | std::ios_base::failbit);
return false;
}
}
return true;
}
Now it would be nice to be able to get the position (line number) of a parsing error.
Update 2:
Got line numbers and comment parsing working, using just 1 character look-ahead. The final result can be seen here in AArray.cpp, in the function parse(). The project is a (de)serializable C++ PHP-like array class.
Your operator>>(istream &, object &) should get data from the input stream, using its formatted and/or unformatted extraction functions, and put it into your object.
If you want to be more safe (after a fashion), construct and test an istream::sentry object before you start. If you encounter a syntax error, you may call setstate( ios_base::failbit ) to prevent any other processing until you call my_stream.clear().
See <istream> (and istream.tcc if you're using SGI STL) for examples.