Is std::stoi actually safe to use?

Is std::stoi actually safe to use? - c++

I had a lovely conversation with someone about the downfalls of std::stoi. To put it bluntly, it uses std::strtol internally, and throws if that reports an error. According to them, though, std::strtol shouldn't report an error for an input of "abcxyz", causing stoi not to throw std::invalid_argument.
First of all, here are two programs tested on GCC about the behaviours of these cases:
strtol
stoi
Both of them show success on "123" and failure on "abc".
I looked in the standard to pull more info:
§ 21.5
Throws: invalid_argument if strtol, strtoul, strtoll, or strtoull reports that
no conversion could be performed. Throws out_of_range if the converted value is
outside the range of representable values for the return type.
That sums up the behaviour of relying on strtol. Now what about strtol? I found this in the C11 draft:
§7.22.1.4
If the subject sequence is empty or does not have the expected form, no
conversion is performed; the value of nptr is stored in the object
pointed to by endptr, provided that endptr is not a null pointer.
Given the situation of passing in "abc", the C standard dictates that nptr, which points to the beginning of the string, would be stored in endptr, the pointer passed in. This seems consistent with the test. Also, 0 should be returned, as stated by this:
§7.22.1.4
If no conversion could be performed, zero is returned.
The previous reference said that no conversion would be performed, so it must return 0. These conditions now comply with the C++11 standard for stoi throwing std::invalid_argument.
The result of this matters to me because I don't want to go around recommending stoi as a better alternative to other methods of string to int conversion, or using it myself as if it worked the way you'd expect, if it doesn't catch text as an invalid conversion.
So after all of this, did I go wrong somewhere? It seems to me that I have good proof of this exception being thrown. Is my proof valid, or is std::stoi not guaranteed to throw that exception when given "abc"?

Does std::stoi throw an error on the input "abcxyz"?
Yes.
I think your confusion may come from the fact that strtol never reports an error except on overflow. It can report that no conversion was performed, but this is never referred to as an error condition in the C standard.
strtol is defined similarly by all three C standards, and I will spare you the boring details, but it basically defines a "subject sequence" that is a substring of the input string corresponding to the actual number. The following four conditions are equivalent:
the subject sequence has the expected form (in plain English: it is a number)
the subject sequence is non-empty
a conversion has occurred
*endptr != nptr (this only makes sense when endptr is non-null)
When there is an overflow, the conversion is still said to have occurred.
Now, it is quite clear that because "abcxyz" does not contain a number, the subject sequence of the string "abcxyz" must be empty, so that no conversion can be performed. The following C90/C99/C11 program will confirm it experimentally:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *nptr = "abcxyz", *endptr[1];
strtol(nptr, endptr, 0);
if (*endptr == nptr)
printf("No conversion could be performed.\n");
return 0;
}
This implies that any conformant implementation of std::stoi must throw invalid_argument when given the input "abcxyz" without an optional base argument.
Does this mean that std::stoi has satisfactory error checking?
No. The person you were talking to is correct when she says that std::stoi is more lenient than performing the full check errno == 0 && end != start && *end=='\0' after std::strtol, because std::stoi silently strips away all characters starting from the first non-numeric character in the string.
In fact off the top of my head the only language whose native conversion behaves somewhat like std::stoi is Javascript, and even then you have to force base 10 with parseInt(n, 10) to avoid the special case of hexadecimal numbers:
input | std::atoi std::stoi Javascript full check
===========+=============================================================
hello | 0 error error(NaN) error
0xygen | 0 0 error(NaN) error
0x42 | 0 0 66 error
42x0 | 42 42 42 error
42 | 42 42 42 42
-----------+-------------------------------------------------------------
languages | Perl, Ruby, Javascript Javascript C#, Java,
| PHP, C... (base 10) Python...
Note: there are also differences among languages in the handling of whitespace and redundant + signs.
Ok, so I want full error checking, what should I use?
I'm not aware of any built-in function that does this, but boost::lexical_cast<int> will do what you want. It is particularly strict since it even rejects surrounding whitespace, unlike Python's int() function. Note that invalid characters and overflows result in the same exception, boost::bad_lexical_cast.
#include <boost/lexical_cast.hpp>
int main() {
std::string s = "42";
try {
int n = boost::lexical_cast<int>(s);
std::cout << "n = " << n << std::endl;
} catch (boost::bad_lexical_cast) {
std::cout << "conversion failed" << std::endl;
}
}

Related

what does cout << "\n"[a==N]; do?

In the following example:
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout, but it does not print a newline when the value of a is equal to N.

I have no clue about what the [] option does in cout
This is actually not a cout option, what is happening is that "\n" is a string literal. A string literal has the type array of n const char, the [] is simply an index into an array of characters which in this case contains:
\n\0
note \0 is appended to all string literals.
The == operator results in either true or false, so the index will be:
0 if false, if a does not equal N resulting in \n
1 if true, if a equals N resulting in \0
This is rather cryptic and could have been replaced with a simple if.
For reference the C++14 standard(Lightness confirmed the draft matches the actual standard) with the closest draft being N3936 in section 2.14.5 String literals [lex.string] says (emphasis mine):
string literal has type “array of n const char”, where n is the
size of the string as defined below, and has static storage duration
(3.7).
and:
After any necessary concatenation, in translation phase 7 (2.2),
’\0’ is appended to every string literal so that programs that scan a string can find its end.
section 4.5 [conv.prom] says:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
Writing a null character to a text stream
The claim was made that writing a null character(\0) to a text stream is undefined behavior.
As far as I can tell this is a reasonable conclusion, cout is defined in terms of C stream, as we can see from 27.4.2 [narrow.stream.objects] which says:
The object cout controls output to a stream buffer associated with the object stdout, declared in
<cstdio> (27.9.2).
and the C11 draft standard in section 7.21.2 Streams says:
[...]Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line;
and printing characters are covered in 7.4 Character handling <ctype.h>:
[...]the term control character
refers to a member of a locale-specific set of characters that are not printing
characters.199) All letters and digits are printing characters.
with footnote 199 saying:
In an implementation that uses the seven-bit US ASCII character set, the printing characters are those
whose values lie from 0x20 (space) through 0x7E (tilde); the control characters are those whose
values lie from 0 (NUL) through 0x1F (US), and the character 0x7F (DEL).
and finally we can see that the result of sending a null character is not specified and we can see this is undefined behavior from section 4 Conformance which says:
[...]Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior.[...]
We can also look to the C99 rationale which says:
The set of characters required to be preserved in text stream I/O are those needed for writing C
programs; the intent is that the Standard should permit a C translator to be written in a maximally
portable fashion. Control characters such as backspace are not required for this purpose, so their
handling in text streams is not mandated.

cout<<"\n"[a==N];
I have no clue about what the [] option does in cout
In C++ operator Precedence table, operator [] binds tighter than operator <<, so your code is equivalent to:
cout << ("\n"[a==N]); // or cout.operator <<("\n"[a==N]);
Or in other words, operator [] does nothing directly with cout. It is used only for indexing of string literal "\n"
For example for(int i = 0; i < 3; ++i) std::cout << "abcdef"[i] << std::endl; will print characters a, b and c on consecutive lines on the screen.
Because string literals in C++ are always terminated with null character('\0', L'\0', char16_t(), etc), a string literal "\n" is a const char[2] holding the characters '\n' and '\0'
In memory layout this looks like:
+--------+--------+
| '\n' | '\0' |
+--------+--------+
0 1 <-- Offset
false true <-- Result of condition (a == n)
a != n a == n <-- Case
So if a == N is true (promoted to 1), expression "\n"[a == N] results in '\0' and '\n' if result is false.
It is functionally similar (not same) to:
char anonymous[] = "\n";
int index;
if (a == N) index = 1;
else index = 0;
cout << anonymous[index];
valueof "\n"[a==N] is '\n' or '\0'
typeof "\n"[a==N] is const char
If the intention is to print nothing (Which may be different from printing '\0' depending on platform and purpose), prefer the following line of code:
if(a != N) cout << '\n';
Even if your intention is to write either '\0' or '\n' on the stream, prefer a readable code for example:
cout << (a == N ? '\0' : '\n');

It's probably intended as a bizarre way of writing
if ( a != N ) {
cout<<"\n";
}
The [] operator selects an element from an array. The string "\n" is actually an array of two characters: a new line '\n' and a string terminator '\0'. So cout<<"\n"[a==N] will print either a '\n' character or a '\0' character.
The trouble is that you're not allowed to send a '\0' character to an I/O stream in text mode. The author of that code might have noticed that nothing seemed to happen, so he assumed that cout<<'\0' is a safe way to do nothing.
In C and C++, that is a very poor assumption because of the notion of undefined behavior. If the program does something that is not covered by the specification of the standard or the particular platform, anything can happen. A fairly likely outcome in this case is that the stream will stop working entirely — no more output to cout will appear at all.
In summary, the effect is,
"Print a newline if a is not equal to N. Otherwise, I don't know. Crash or something."
… and the moral is, don't write things so cryptically.

It is not an option of cout but an array index of "\n"
The array index [a==N] evaluates to [0] or [1], and indexes the character array represented by "\n" which contains a newline and a nul character.
However passing nul to the iostream will have undefined results, and it would be better to pass a string:
cout << &("\n"[a==N]) ;
However, the code in either case is not particularly advisable and serves no particular purpose other than to obfuscate; do not regard it as an example of good practice. The following is preferable in most instances:
cout << (a != N ? "\n" : "") ;
or just:
if( a != N ) cout << `\n` ;

Each of the following lines will generate exactly the same output:
cout << "\n"[a==N]; // Never do this.
cout << (a==N)["\n"]; // Or this.
cout << *((a==N)+"\n"); // Or this.
cout << *("\n"+(a==N)); // Or this.
As the other answers have specified, this has nothing to do with std::cout. It instead is a consequence of
How the primitive (non-overloaded) subscripting operator is implemented in C and C++.
In both languages, if array is a C-style array of primitives, array[42] is syntactic sugar for *(array+42). Even worse, there's no difference between array+42 and 42+array. This leads to interesting obfuscation: Use 42[array] instead of array[42] if your goal is to utterly obfuscate your code. It goes without saying that writing 42[array] is a terrible idea if your goal is to write understandable, maintainable code.
How booleans are transformed to integers.
Given an expression of the form a[b], either a or b must be a pointer expression and the other; the other must be an integer expression. Given the expression "\n"[a==N], the "\n" represents the pointer part of that expression and the a==N represents the integer part of the expression. Here, a==N is a boolean expression that evaluates to false or true. The integer promotion rules specify that false becomes 0 and true becomes 1 on promotion to an integer.
How string literals degrade into pointers.
When a pointer is needed, arrays in C and C++ readily degrade into a pointer that points to the first element of the array.
How string literals are implemented.
Every C-style string literal is appended with the null character '\0'. This means the internal representation of your "\n" is the array {'\n', '\0'}.
Given the above, suppose a==N evaluates to false. In this case, the behavior is well-defined across all systems: You'll get a newline. If, on the other hand, a==N evaluates to true, the behavior is highly system dependent. Based on comments to answers to the question, Windows will not like that. On Unix-like systems where std::cout is piped to the terminal window, the behavior is rather benign. Nothing happens.
Just because you can write code like that doesn't mean you should. Never write code like that.

g++ regex crash on (possibly unsyntactic) expression

I figure the following program should either complain it can't compile the regular expression or else treat it as legal and compile it fine (I don't have the standard so I can't say for sure whether the expression is strictly legal; certainly reasonable interpretations are possible). Anyway, what happens with g++ (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1 is that, when run, it crashes hard
*** Error in `./a.out': free(): invalid next size (fast): 0x08b51248 ***
in the guts of the library.
Questions are:
a) it's bug, right? I assume (perhaps incorrectly) the standard doesn't say std::regex can crash if it doesn't like the syntax. (msvc eats it fine, fwiw)
b) if it's a bug, is there some easy way to see whether it's been reported or not (my first time poking around gnu-land bug systems was intimidating)?
#include <iostream>
#include <regex>
int main(void)
{
const char* Pattern = "^(%%)|";
std::regex Machine;
try {
Machine = Pattern;
}
catch(std::regex_error e)
{
std::cerr << "regex could not compile pattern: "
<< Pattern << "\n"
<< e.what() << std::endl;
throw;
}
return 0;
}

I would put this in a comment, but I can't, so...
I don't know if you already know, but it seems to be the pipe | character at the end that's causing your problems. It seems like the character representation of | as a last character (since "^(%%)|a" works fine for me) given by g++ is making a mess when regex tries to call free();
The standard (or at least the online draft I'm reading) claims that:
28.8
Class template basic_regex
[re.regex]
1 For a char-like type charT, specializations of class template basic_regex represent regular expressions
constructed from character sequences of charT characters. In the rest of 28.8, charT denotes a given char-
like type. Storage for a regular expression is allocated and freed as necessary by the member functions of
class basic_regex.
2 Objects of type specialization of basic_regex are responsible for converting the sequence of charT objects
to an internal representation. It is not specified what form this representation takes, nor how it is accessed by
algorithms that operate on regular expressions.
[ Note: Implementations will typically declare some function
templates as friends of basic_regex to achieve this — end note ]
and later,
basic_regex& operator=(const charT* ptr);
3 Requires: ptr shall not be a null pointer.
4 Effects: returns assign(ptr).
So unless g++ thinks const char* Pattern ="|"; is a null ptr (I would imagine not...),
I guess it's a bug?
EDIT: Incidentally, consecutive || (even when not at the end) seem to cause a segmentation fault for me also.

How does this C++ code work?

enum STR2INT_ERROR { SUCCESS, OVERFLOW, UNDERFLOW, INCONVERTIBLE };
STR2INT_ERROR str2int (int &i, char const *s, int base = 0)
{
char *end;
long l;
errno = 0;
l = strtol(s, &end, base);
if ((errno == ERANGE && l == LONG_MAX) || l > INT_MAX) {
return OVERFLOW;
}
if ((errno == ERANGE && l == LONG_MIN) || l < INT_MIN) {
return UNDERFLOW;
}
if (*s == '\0' || *end != '\0') {
return INCONVERTIBLE;
}
i = l;
return SUCCESS;
}
I'm trying to write a program that can parse strings read in from a file into integer values. While looking for a method to do this I found this piece of code above on a stackoverflow post:
How to parse a string to an int in C++?
However, I can't understand how it works.
Specifically, why is the programmer checking if errno == ERANGE if errno is assigned to 0? (is ERANGE a special value? )
secondly, what does "char const *s" - in the arguments list- mean?
PS: I'm not very experienced when it comes to C++ programming.

The code is using strtol() to do the parsing. This is a standard C library function. You can find documentation on strtol() here amongst other places:
strtol() man page on die.net
The errno variable is a special global variable defined by the standard C library. If a function encounters an error it is set to an error code. So while errno is assigned zero at the start of the routine, the strtol() function will assign a new value to errno if it encounters an error. The following if-statements are checking for the overflow and underflow error conditions.
The char const *s parameter is the string to be parsed. Its a pointer to a constant (read-only) string of characters. By convention strings are terminated by a NULL byte.

Whenever I have done string to int conversions in C++ I used the atoi method. There should be plenty of examples online that suit what you want to do

Most of the specialness here is with errno, not the values being compared to.
errno is a global that's used by some (especially older) library functions to signal errors. You assign 0 to it (which implicitly means there's no problem). Then, if it runs into a problem, a library function can assign some non-zero value to it to tell you want went wrong.
After calling the library function, you then typically check 1) whether it's now non-zero, and 2) if so, what value it has. Based on the value that's been assigned, you can react to the type of error that arose.
I should add, however, that many uses of errno are mostly non-portable. The C standard says that errno exists, that no library function assigns 0 to errno, but not a lot more more than that. It does not specify what non-zero values any particular function may assign to it (well, it specifies some non-zero values that some functions assign, but doesn't limit assignments to those values or those functions).

First of all, this is clearly a C program in C++ disguise.
strtol is a function from standard C library, which does the actual work. Its doumentation may be accessed there: http://linux.die.net/man/3/strtol
All other things are just preliminaries and checks.
errno is a special global variable from the C library which may be modified by standard functions in order to set an appropriate error code (yes, it's C legacy and this is not thread-safe). Its value may be set to values defined in standard header "errno.h".

errno is a library-provided global variable that strtol (as well as other library functions) uses to indicate error conditions. In the above code strtol could change errno after the user set it to 0. ERANGE is indeed a named constant provided by the standard library, which stands for some special value used by strtol to indicate out-of-range errors.
Your char const *s question is too vague. What specifically do you not understand in it? The const part means that the user code inside str2int will not be allowed to modify the string pointed by s. The compiler will do its best to prevent any modifying (or potentially modifying) operations on string pointed by s.

Scanf function with strings

I need to read input from user. The input value may be string type or int type.
If the value is int then the program insert the value into my object.
Else if the value is string then it should check the value of that string, if it's "end" then the program ends.
Halda h; //my object
string t;
int tint;
bool end=false;
while(end!=true)
{
if(scanf("%d",&tint)==1)
{
h.insert(tint);
}
else if(scanf("%s",t)==1)
{
if(t=="end")
end=true;
else if(t=="next")
if(h.empty()==false)
printf("%d\n",h.pop());
else
printf("-1\n");
}
}
The problem is that scanning string doesn't seem to work properly.
I've tried to change it to: if(cin>>t) and it worked well.
I need to get it work with scanf.

The specifier %s in the scanf() format expects a char*, not a std::string.
From C11 Standard (C++ Standard refers to it about the C standard library):
Except in the case of a % specifier, the input item (or, in the case of a %n directive, the
count of input characters) is converted to a type appropriate to the conversion specifier. If
the input item is not a matching sequence, the execution of the directive fails: this
condition is a matching failure. Unless assignment suppression was indicated by a *, the
result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object
does not have an appropriate type, or if the result of the conversion cannot be represented
in the object, the behavior is undefined.
Anyway, here there's is no real reason to prefer the C way, use C++ facilities. And when you use the C library, use safe functions that only reads characters up to a given limit (just like fgets, or scanf with a width specifier), otherwise you could have overflow, that leads again to undefined behavior, and some errors if you're luck.

That's a really bad way to check for end-of-input. Either use an integer or use a string.
If you choose string, make provisions to convert from string to int.
My logic would be to first check if it can be converted to integer. if it can be, then continue with the logic. If it can't be(such as if it's a float or double or some other string) then ignore and move on. If it can be, then insert it into Halda's object.
Sidenote: Do not use scanf() and printf() when you're working with C++.

Assuming string refers to std::sring this program doesn't have defined behavior. You can't really use std::string with sscanf() You could set up a buffer inside the std::string and read into that but the string wouldn't change its size. You are probably better off using streams with std::string (well, in my opinion you are always better off using streams).

Disambiguating std::isalpha() in C++

So I am currently writing a part of a program that takes user text input. I want to ignore all input characters that are not alphabetic, and so I figured std::isalpha() would be a good way to do this. Unfortunately, as far as I know there are two std::isalpha() functions, and the general one needs to be disambiguated from the locale-specific one thusly:
(int(*)(int))std::isalpha()
If I don't disambiguate, std::isalpha seems to return true when reading uppercase but false when reading lowercase letters (if I directly print the returned value, though, it returns 0 for non-alpha chars, 1 for uppercase chars, and 2 for lowercase chars). So I need to do this.
I've done so in another program before, but for some reason, in this project, I sometimes get "ISO C++ forbids" errors. Note, only sometimes. Here is the problematic area of code (this appears together without anything in between):
std::cout << "Is alpha? " << (int(*)(int))std::isalpha((char)Event.text.unicode) << "\n";
if ( (int(*)(int))std::isalpha((char)Event.text.unicode) == true)
{
std::cout << "Is alpha!\n";
//...snip...
}
The first instance, where I send the returned value to std::cout, works fine - I get no errors for this, I get the expected values (0 for non-alpha, 1 for alpha), and if that's the only place I try to disambiguate, the program compiles and runs fine.
The second instance, however, throws up this:
error: ISO C++ forbids comparison between pointer and integer
and only compiles if I remove the (int(*)(int)) snippet, at which point bad behavior ensues. Could someone enlighten me here?

You are casting the return value of the std::alpha() call to int(*)(int), and then compare that pointer to true. Comparing pointers to boolean values doesn't make much sense and you get an error.
Now, without the cast, you compare the int returned by std::alpha() to true. bool is an integer type, and to compare the two different integer types the values are first converted to the same type. In this case they are both converted to int. true becomes 1, and if std::isalpha() returned 2 the comparison ends up with 2 != 1.
If you want to compare the result of std::alpha() against a bool, you should cast that returned in to bool, or simply leave out the comparison and use something like if (std::isalpha(c)) {...}

There is no need to disambiguate, because the there is no ambiguity in a normal call.
Also, there is no need to use the std:: prefix when you get the function declaration from <ctype.h>, which after C++11 is the header you should preferably use (i.e., not <cctype>) – and for that matter also before C++11, but C++11 clinched it.
Third, you should not compare the result to true.
However, you need to cast a char argument to unsigned char, lest you get Undefined Behavior for anything but 7-bit ASCII.
E.g. do like this:
bool isAlpha( char const c )
{
typedef unsigned char UChar;
return !!isalpha( UChar( c ) );
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js