Distinguishing between failure and end of file in read loop

Distinguishing between failure and end of file in read loop - c++

The idiomatic loop to read from an istream is
while (thestream >> value)
{
// do something with value
}
Now this loop has one problem: It will not distinguish if the loop terminated due to end of file, or due to an error. For example, take the following test program:
#include <iostream>
#include <sstream>
void readbools(std::istream& is)
{
bool b;
while (is >> b)
{
std::cout << (b ? "T" : "F");
}
std::cout << " - " << is.good() << is.eof() << is.fail() << is.bad() << "\n";
}
void testread(std::string s)
{
std::istringstream is(s);
is >> std::boolalpha;
readbools(is);
}
int main()
{
testread("true false");
testread("true false tr");
}
The first call to testread contains two valid bools, and therefore is not an error. The second call ends with a third, incomplete bool, and therefore is an error. Nevertheless, the behaviour of both is the same. In the first case, reading the boolean value fails because there is none, while in the second case it fails because it is incomplete, and in both cases EOF is hit. Indeed, the program above outputs twice the same line:
TF - 0110
TF - 0110
To solve this problem, I thought of the following solution:
while (thestream >> std::ws && !thestream.eof() && thestream >> value)
{
// do something with value
}
The idea is to detect regular EOF before actually trying to extract the value. Because there might be whitespace at the end of the file (which would not be an error, but cause read of the last item to not hit EOF), I first discard any whitespace (which cannot fail) and then test for EOF. Only if I'm not at the end of file, I try to read the value.
For my example program, it indeed seems to work, and I get
TF - 0100
TF - 0110
So in the first case (correct input), fail() returns false.
Now my question: Is this solution guaranteed to work, or was I just (un-)lucky that it happened to give the desired result? Also: Is there a simpler (or, if my solution is wrong, a correct) way to get the desired result?

It is very easy to differentiate between EOF and other errors, as long as you don't configure the stream to use exceptions.
Simply check stream.eof() at the end.
Before that only check for failure/non-failure, e.g. stream.fail() or !stream. Note that good is not the opposite of fail. So in general never even look at the good, only at the fail.
Edit:
Some example code, namely your example modified to distinguish an ungood bool specification in the data:
#include <iostream>
#include <sstream>
#include <string>
#include <stdexcept>
using namespace std;
bool throwX( string const& s ) { throw runtime_error( s ); }
bool hopefully( bool v ) { return v; }
bool boolFrom( string const& s )
{
istringstream stream( s );
(stream >> boolalpha)
|| throwX( "boolFrom: failed to set boolalpha mode." );
bool result;
(stream >> result)
|| throwX( "boolFrom: failed to extract 'bool' value." );
char c; stream >> c;
hopefully( stream.eof() )
|| throwX( "boolFrom: found extra characters at end." );
return result;
}
void readbools( istream& is )
{
string word;
while( is >> word )
{
try
{
bool const b = boolFrom( word );
cout << (b ? "T" : "F") << endl;
}
catch( exception const& x )
{
cerr << "!" << x.what() << endl;
}
}
cout << "- " << is.good() << is.eof() << is.fail() << is.bad() << "\n";
}
void testread( string const& s )
{
istringstream is( s );
readbools( is );
}
int main()
{
cout << string( 60, '-' ) << endl;
testread( "true false" );
cout << string( 60, '-' ) << endl;
testread( "true false tr" );
cout << string( 60, '-' ) << endl;
testread( "true false truex" );
}
Example result:
------------------------------------------------------------
T
F
- 0110
------------------------------------------------------------
T
F
!boolFrom: failed to extract 'bool' value.
- 0110
------------------------------------------------------------
T
F
!boolFrom: found extra characters at end.
- 0110
Edit 2: in the posted code and results, added example of using eof() checking, which I forgot.
Edit 3:
The following corresponding example uses the OP’s proposed skip-whitespace-before-reading solution:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
void readbools( istream& is )
{
bool b;
while( is >> ws && !is.eof() && is >> b ) // <- Proposed scheme.
{
cout << (b ? "T" : "F") << endl;
}
if( is.fail() )
{
cerr << "!readbools: failed to extract 'bool' value." << endl;
}
cout << "- " << is.good() << is.eof() << is.fail() << is.bad() << "\n";
}
void testread( string const& s )
{
istringstream is( s );
is >> boolalpha;
readbools( is );
}
int main()
{
cout << string( 60, '-' ) << endl;
testread( "true false" );
cout << string( 60, '-' ) << endl;
testread( "true false tr" );
cout << string( 60, '-' ) << endl;
testread( "true false truex" );
}
Example result:
------------------------------------------------------------
T
F
- 0100
------------------------------------------------------------
T
F
!readbools: failed to extract 'bool' value.
- 0110
------------------------------------------------------------
T
F
T
!readbools: failed to extract 'bool' value.
- 0010
The main difference is that this approach produces 3 successfully read values in the third case, even though the third value is incorrectly specified (as "truex").
I.e. it fails to recognize an incorrect specification as such.
Of course, my ability to write Code That Does Not Work™ is no proof that it can not work. But I am fairly good at coding up things, and I could not see any way to detect the "truex" as incorrect, with this approach (while it was easy to do with the read-words exception based approach). So at least for me, the read-words exception based approach is simpler, in the sense that it is easy to make it behave correctly.

Related

What am I iterating in this find_if function?

Here is my code:
bool isNotValid (char a) {
if (isalpha(a) || a == '_')
{
cout << "\n- isalpha";
return 0;
}
else
{
cout << "\n- notalpha";
return 1;
}
}
bool test123(const string& test)
{
return find_if(test.begin(), test.end(), isNotValid) != test.end();
}
int main()
{
string test;
cout << "Test input: ";
cin >> test;
if (!test123(test))
cout << "\n- Valid\n";
else
cout << "\n- Not Valid\n";
return 0;
}
This is part of my code to check the validity of username in my program. I don't really understand what exactly I am iterating through when I insert the string into my function as address of the string. CPP reference states that find_if iterates from first to last position of a sequence.
Poked through the code with cout at different location, still didn't quite catch what is going on.

You are iterating your string. You did not pass the address of the string. The function takes the string as a reference to const, meaning it passes the actual string (no copy is made) and the function is not allowed to modify the string. You are iterating character by character in your string and calling your function isNotValid() on each character.
Notes:
Instead of returning 1 or 0 from isNotValid(), return true or false.
Consider flipping your logic and renaming the function to isValid() instead. You would also have to change test123() to use std::find_if_not(). Finally, you would check if the returned iterator is end() and not if it's not.
But, if you do change isNotValid() to isValid(), you'd be better off switching from std::find_if() entirely to to std::all_of(). It makes more sense, is more readable, and returns a bool directly (No need to compare against end()).
But if you want to keep your function isNotValid(), the comment that suggests using std::any_of() is what I would recommend for the same reasons.
Here's my take on your code:
#include <algorithm>
#include <cctype>
#include <iostream>
#include <string>
bool isValid(char a) {
return std::isalpha(static_cast<unsigned char>(a)) || a == '_'; // !
}
bool test123(const std::string& test) {
return std::all_of(test.begin(), test.end(), isValid); // !
}
int main() {
std::string testOne{"i_am_valid"};
std::string testTwo{"i_am_invalid_123"};
std::cout << "Testing: " << testOne << " : " << std::boolalpha
<< test123(testOne) << '\n';
std::cout << "Testing: " << testTwo << " : " << std::boolalpha
<< test123(testTwo) << '\n';
}
Output:
❯ ./a.out
Testing: i_am_valid : true
Testing: i_am_invalid_123 : false
I would argue that readability has stayed largely the same, but the mental load has been shifted; the Boolean flips make a bit more sense.
As you progress in your learning, you might not even want to have the function isValid() if it's a one-off thing. C++11 introduced lambdas, or functions as objects. C++20 also introduced ranges, so you don't have to pass a pair of iterators if you intend to iterate the whole container anyway.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <string>
bool test123(const std::string& test) {
return std::ranges::all_of(test, [](const auto& c) {
return std::isalpha(static_cast<unsigned char>(c)) || c == '_';
}); // !
}
int main() {
std::string testOne{"i_am_valid"};
std::string testTwo{"i_am_invalid_123"};
std::cout << "Testing: " << testOne << " : " << std::boolalpha
<< test123(testOne) << '\n';
std::cout << "Testing: " << testTwo << " : " << std::boolalpha
<< test123(testTwo) << '\n';
}
That's a bit hairy to read if you're not familiar with lambdas, but I find lambdas useful for checks like this where you're just doing it the one time.

Char* input check if it is hex or dec and parse it into unsigned short using appropriate error handling and reporting

I am trying to parse the input into unsigned short. The input can be anything but we can only accept hex or decimal. It needs to fit into unsigned short therefore no negative values or over 0xffff (65535). Invalid values must report errors appropriately and with enough information using C++ features.
My attempt (but it doesn't check for invalid hex values e.g. 5xffff):
void parse_input(char *input, unsigned short &output)
{
std::string soutput(input);
int myint1;
try
{
myint1 = std::stoi(soutput, 0, 0);
if (myint1 > std::numeric_limits<unsigned short>::max())
{
std::cerr << "Value: " << myint1
<< " is out of bounds!" << std::endl;
exit(EXIT_FAILURE);
}
output = myint1;
}
catch (std::exception &e)
{
std::cerr << "exception caught: " << e.what() << std::endl;
exit(EXIT_FAILURE);
}
}
Another attempt which also doesn't do all that (and apparently usage of errno is not acceptable):
auto n = strtoul(argv[2], NULL, 0);
if (errno == ERANGE || n > std::numeric_limits<unsigned short>::max()) {
}
else {
}
So the actual question based on the above is, what is the most efficient and effective way to resolve this using C++ features? Please provide an example.
Many thanks in advance.

So the actual question based on the above is, what is the most efficient and effective way to resolve this using C++ features? Please provide an example.
As your input numbers seem to be distinguished using a 0x for hex input and no prefix for decimal numbers, here's a small solution using a custom I/O manipulator:
std::istream& hex_or_decimal(std::istream& is) {
char peek = is.peek();
int zero_count = 0;
while(peek == '0' || std::isspace(peek)) {
if(peek == '0') {
++zero_count;
}
// Consume 0 prefixes as they wont affect the result
char dummy;
is.get(dummy);
peek = is.peek();
if((peek == 'x' || zero_count) && zero_count <= 1) {
is.get(dummy);
is >> std::hex;
return is;
}
}
is >> std::dec;
return is;
}
And use that like:
int main()
{
std::istringstream iss { "5 0x42 33 044 00x777 0x55" };
short input = 0;
while(iss >> hex_or_decimal >> input) {
std::cout << std::dec << input
<< " 0x" << std::hex << input << std::endl;
}
if(iss.fail()) {
std::cerr << "Invalid input!" << std::endl;
}
}
The output is
5 0x5
66 0x42
33 0x21
44 0x2c
Invalid input!
See the live example here please.
Note:
The 5xfffff value is signalled as invalid after 5 was consumed correctly by the stream (see the demonstration here)
You can easily adapt that to your needs (e.g. throwing an exception at invalid input) using the std::istream standard capabilities and flags.
E.g.:
Thowing and catching exceptions
int main()
{
std::istringstream iss { "5 0x42 33 044 00x777 0x55" };
iss.exceptions(std::ifstream::failbit); // <<<
try {
short input = 0;
while(iss >> hex_or_decimal >> input) {
std::cout << std::dec << input
<< " 0x" << std::hex << input << std::endl;
}
}
catch (const std::ios_base::failure &fail) { <<<
std::cerr << "Invalid input!" << std::endl;
}
}

Why does this leveldb code truncate "std::string"s that have spaces in them?

I wrote this piece of code to try leveldb. I am using Unix time as keys. For values that have spaces, only the last part gets saved. Here is the code. I am running Linux Kernel 4.4.0-47-generic
while (true) {
std::string note;
std::string key;
std::cout << "Test text here ";
std::cin >> note;
std::cout << std::endl;
if(note.size() == 0 || tolower(note.back()) == 'n' ) break;
key = std::to_string(std::time(nullptr));
status = db->Put(write_options, key, note);
if(!status.ok()) break;
}
std::cout << "Read texts........" << std::endl;
leveldb::Iterator* it = db->NewIterator(leveldb::ReadOptions());
for(it->SeekToFirst(); it->Valid(); it->Next()){
std::cout << it->key().ToString() << " " << it->value().ToString() << std::endl;
}
delete db;

The issue is not in leveldb, but in the way you read the input:
std::string note;
std::cin >> note;
This will read only up to the first whitespace. It is common mistake, see for example:
reading a line from ifstream into a string variable

Difference between initializations stringstream.str( a_value ) and stringstream << a_value

Consider:
std::string s_a, s_b;
std::stringstream ss_1, ss_2;
// at this stage:
// ss_1 and ss_2 have been used and are now in some strange state
// s_a and s_b contain non-white space words
ss_1.str( std::string() );
ss_1.clear();
ss_1 << s_a;
ss_1 << s_b;
// ss_1.str().c_str() is now the concatenation of s_a and s_b,
// <strike>with</strike> without space between them
ss_2.str( s_a );
ss_2.clear();
// ss_2.str().c_str() is now s_a
ss_2 << s_b; // line ***
// ss_2.str().c_str() the value of s_a is over-written by s_b
//
// Replacing line *** above with "ss_2 << ss_2.str() << " " << s_b;"
// results in ss_2 having the same content as ss_1.
Questions:
What is the difference between stringstream.str( a_value ); and
stringstream << a_value; and, specifically, why does the first not
allow concatenation via << but the second does?
Why did ss_1 automatically get white-space between s_a and s_b, but
do we need to explicitly add white space in the line that could
replace line ***: ss_2 << ss_2.str() << " " << s_b;?

The problem you're experiencing is because std::stringstream is constructed by default with ios_base::openmode mode = ios_base::in|ios_base::out which is a non-appending mode.
You're interested in the output mode here (ie: ios_base::openmode mode = ios_base::out)
std::basic_stringbuf::str(const std::basic_string<CharT, Traits, Allocator>& s) operates in two different ways, depending on the openmode:
mode & ios_base::ate == false: (ie: non-appending output streams):
str will set pptr() == pbase(), so that subsequent output will overwrite the characters copied from s
mode & ios_base::ate == true: (ie: appending output streams):
str will set pptr() == pbase() + s.size(), so that subsequent output will be appended to the last character copied from s
(Note that this appending mode is new since c++11)
More details can be found here.
If you want the appending behaviour, create your stringstream with ios_base::ate:
std::stringstream ss(std::ios_base::out | std::ios_base::ate)
Simple example app here:
#include <iostream>
#include <sstream>
void non_appending()
{
std::stringstream ss;
std::string s = "hello world";
ss.str(s);
std::cout << ss.str() << std::endl;
ss << "how are you?";
std::cout << ss.str() << std::endl;
}
void appending()
{
std::stringstream ss(std::ios_base::out | std::ios_base::ate);
std::string s = "hello world";
ss.str(s);
std::cout << ss.str() << std::endl;
ss << "how are you?";
std::cout << ss.str() << std::endl;
}
int main()
{
non_appending();
appending();
exit(0);
}
This will output in the 2 different ways as explained above:
hello world
how are you?
hello world
hello worldhow are you?

Suggest you read stringstream reference: http://en.cppreference.com/w/cpp/io/basic_stringstream
std::stringstream::str() Replaces the contents of the underlying string
operator<< Inserts data into the stream.

(C++ Query) Accessing the instantiated objects globally

This is a basic program to get two 5-digit numbers as string and use addition on the 2 numbers utilising operator overloading on '+' .
#include <iostream>
#include <limits>
#include <cstdlib>
#include <cstring>
#include <sstream>
using namespace std;
class IntStr
{
int InputNum;
public:
//IntStr();
IntStr::IntStr(int num);
IntStr operator+ (const IntStr &);
//~IntStr();
void Display();
};
IntStr::IntStr(int num)
{
InputNum = num;
}
void IntStr::Display()
{
cout << "Number is (via Display) : " << InputNum <<endl;
}
IntStr IntStr::operator+ (const IntStr & second) {
int add_result = InputNum + second.InputNum;
return IntStr(add_result);
}
int main()
{
string str;
bool option = true;
bool option2 = true;
while (option)
{
cout << "Enter the number : " ;
if (!getline(cin, str))
{
cerr << "Something went seriously wrong...\n";
}
istringstream iss(str);
int i;
iss >> i; // Extract an integer value from the stream that wraps str
if (!iss)
{
// Extraction failed (or a more serious problem like EOF reached)
cerr << "Enter a number dammit!\n";
}
else if (i < 10000 || i > 99999)
{
cerr << "Out of range!\n";
}
else
{
// Process i
//cout << "Stream is: " << iss << endl; //For debugging purposesc only
cout << "Number is : " << i << endl;
option = false;
IntStr obj1 = IntStr(i);
obj1.Display();
}
}//while
while (option2)
{
cout << "Enter the second number : " ;
if (!getline(cin, str))
{
cerr << "Something went seriously wrong...\n";
}
istringstream iss(str);
int i;
iss >> i; // Extract an integer value from the stream that wraps str
if (!iss) //------------------------------------------> (i)
{
// Extraction failed (or a more serious problem like EOF reached)
cerr << "Enter a number dammit!\n";
}
else if (i < 10000 || i > 99999)
{
cerr << "Out of range!\n";
}
else
{
// Process i
//cout << "Stream is: " << iss << endl; //For debugging purposes only
cout << "Number is : " << i << endl;
option2 = false;
IntStr obj2 = IntStr(i);
obj2.Display();
//obj1->Display();
}
}//while
//IntStr Result = obj1 + obj2; // --------------------> (ii)
//Result.Display();
cin.get();
}
Need clarification on the points (i) & (ii) in the above code ...
(1) What does (i) actually do ?
(2) (ii) -> Does not compile.. as the error "obj1 not declared (first use this function)" comes up. Is this because obj1 & obj2 are declared only inside the while loops? How do I access them globally?

1) From http://www.cplusplus.com/reference/iostream/ios/operatornot/ :
bool operator ! ( ) const; Evaluate
stream object
Returns true if either one of the
error flags (failbit or badbit) is set
on the stream. Otherwise it returns
false.
From http://www.cplusplus.com/reference/iostream/ios/fail/ :
failbit is generally set by an input
operation when the error was related
with the internal logic of the
operation itself, while badbit is
generally set when the error involves
the loss of integrity of the stream,
which is likely to persist even if a
different operation is performed on
the stream.
2) The two objects are not in scope, they exists only in the previous brackets.

calls the overloaded operator which evaluates the stream in boolean context. This checks the state of the stream to see if the previous operation had failed - if so, you cannot rely on the value in the integer variable i being valid because the input on the stream was not an integer.
the variables obj1 and obj2 are defined in the scope of the while loop - they are not available outside the scope. You can declare them outside the scope of the while in which case the variable will hold the last value it held in the while loop.

if (!iss)
tests if the stream is in a bad state, which will be the case if a conversion failed or if you are at the end of the stream
obj1 is defined here:
else
{
// Process i
//cout << "Stream is: " << iss << endl; //For debugging purposesc only
cout << "Number is : " << i << endl;
option = false;
IntStr obj1 = IntStr(i);
obj1.Display();
}
it is therefore local to the else-block & can't be accessed outside it. If you want to increase its scope, modve its definition outside of the block. It is not a good idea to move it outside of all blocks (i.e. make it global), however.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Distinguishing between failure and end of file in read loop - c++

Related

What am I iterating in this find_if function?

Char* input check if it is hex or dec and parse it into unsigned short using appropriate error handling and reporting

Why does this leveldb code truncate "std::string"s that have spaces in them?

Difference between initializations stringstream.str( a_value ) and stringstream << a_value

(C++ Query) Accessing the instantiated objects globally

Categories

Resources