C++17 Purpose of std::from_chars and std::to_chars? - c++

Before C++17, there existed a variety of methods to convert integers, floats, and doubles to and from strings. For example, std::stringstream, std::to_string, std::atoi, std::stoi, and others could have been used to accomplish these tasks. To which, there exists plenty of posts discussing the differences between those methods.
However, C++ 17 has now introduced std::from_chars and std::to_chars. To which, I'd like to know the reasons for introducing another means of converting to and from strings.
For one, what advantages and functionality do these new functions provide over the previous methods?
Not only that, but are there any notable disadvantages for this new method of string conversion?

std::stringstream is the heavyweight champion. It takes into consideration things like the the stream's imbued locale, and its functionality involves things like constructing a sentry object for the duration of the formatted operation, in order to deal with exception-related issues. Formatted input and output operations in the C++ libraries have some reputation for being heavyweight, and slow.
std::to_string is less intensive than std::istringstream but it still returns a std::string, whose construction likely involves dynamic allocation (less likely with modern short string optimization techniques, but still likely). And, in most cases the compiler still needs to generate all the verbiage, at the call site, to support a std::string object, including its destructor.
std::to_chars are designed to have as little footprint as possible. You provide the buffer, and std::to_chars does very little beyond actually formatting the numeric value into the buffer, in a specific format, without any locale-specific considerations, with the only overhead of making sure that the buffer is big enough. Code that uses std::to_chars does not need to do any dynamic allocation.
std::to_chars is also a bit more flexible in terms of formatting options, especially with floating point values. std::to_string has no formatting options.
std::from_chars is, similarly, a lightweight parser, that does not need to do any dynamic allocation, and does not need to sacrifice any electrons to deal with locale issues, or overhead of stream operations.

to/from_chars are designed to be elementary string conversion functions. They have two basic advantages over the alternatives.
They are much lighter weight. They never allocate memory (you allocate memory for them). They never throw exceptions. They also never look at the locale, which also improves performance.
Basically, they are designed such that it is impossible to have faster conversion functions at an API level.
These functions could even be constexpr (they aren't, though I'm not sure why), while the more heavyweight allocating and/or throwing versions can't.
They have explicit round-trip guarantees. If you convert a float/double to a string (without a specified precision), the implementation is required to make it so that taking that exact sequence of characters and converting it back into a float/double will produce a binary-identical value. You won't get that guarantee from snprintf, stringstream or to_string/stof.
This guarantee is only good however if the to_chars and from_chars calls are using the same implementation. So you can't expect to send the string across the Internet to some other computer that may be compiled with a different standard library implementation and get the same float. But it does give you on-computer serialization guarantees.

All these pre-existing methods were bound to work based on a so-called locale. A locale is basically a set of formatting options that specify, e.g., what characters count as digits, what symbol to use for the decimal point, what thousand's separator to use, and so on. Very often, however, you don't really need that. If you're just, e.g., reading a JSON file, you know the data is formatted in a particular way, there is no reason to be looking up whether a '.' should be a decimal point or not every time you see one. The new functions introduced in <charconv> are basically hardcoded to read and write numbers based on the formatting laid out for the default C locale. There is no way to change the formatting, but since the formatting doesn't have to be flexible, they can be very fast…

Related

Why can't you add int and float literals in F#?

In F#, if you write 3 + 2.5 you get a syntax error. You have to write e.g. 3. + 2.5 to make it work, which can get annoying in math-heavy domains with many numeric literals.
Seeing as many other languages (e.g. C#) handle this just fine, is there a particular reason why F# doesn't implicitly convert int literals to float (which is a lossless conversion as far as I know) when performing arithmetic operations?
It's true that int to float is "safe". However, the lack of implicit conversion between types in general is considered to be a good feature of F# by many, as others have mentioned.
F# has much more extensive type-inference than C#. The types that are inferred by usage can be passed all the way through a large codebase. Implicit conversion between numeric types could complicate that inference can make it harder to understand type errors and increase the maintenance burden of the compiler code itself. In fact, F# doesn't perform any implicit conversions defined in C#.
By eliminating unnecessary casts, implicit conversions can improve source code readability. However, because implicit conversions do not require programmers to explicitly cast from one type to the other, care must be taken to prevent unexpected results.
Again, this decreases convenience but reduces the chance of incorrect behaviour, which can be a much bigger inconvenience later on, or for someone else.
Basically, this approach trades some convenience for another convenience (not having to write type names everywhere) and some increased safety/explicitness. I personally think it's a good trade-off for F#.
F# is a functional first language, one of the core values in functional languages is being able to reason about your code. Which is a fancy way of saying it's easy to understand what your code and what it is doing. Now explicit operations mean that your code will be easier to reason about, don't believe me?
Here is some python code that takes a number, turns it into a string, then back into a number, guess what it returns:
float(str(0.47000000000000003))
Did you guess 0.47000000000000003? Sorry, it's actually 0.46999999999999997! There are all sorts of weirdness like that when converting from double to decimal to float! Best to pick a type and stick to it. Now constantly having to specify the type might seem annoying at first, but the value of never having to worry about what types your functions are using, vs the the types being sent in... god help you if a library chose types for you as well... well, let's just say you will appreciate the explicitness as time goes on ;)

dtoa vs sprintf vs Grisu3 algorithm

What is the best way to render double precision numbers as strings in C++?
I ran across the article Here be dragons: advances in problems you didn’t even know you had which discusses printing floating point numbers.
I have been using sprintf. I don't understand why I would need to modify the code?
If you are happy with sprintf_s you shouldn't change. However if you need to format your output in a way that is not supported by your library, you might need to reimplement a specialized version of sprintf (with any of the known algorithms).
For example JavaScript has very precise requirements on how its numbers must be printed (see section 9.8.1 of the specification). The correct output can't be accomplished by simply calling sprintf. Indeed, Grisu has been developed to implement correct number-printing for a JavaScript compiler.
Grisu is also faster than sprintf, but unless floating-point printing is a bottleneck in your application this should not be a reason to switch to a different library.
Ahah !
The problem outlined in the article you give is that for some numbers, the computer displays something that is theoritically correct but not what we, humans, would have used.
For example, like the article says, 1.2999999... = 1.3, so if your result is 1.3, it's (quite) correct for the computer to display it as 1.299999999... But that's not what you would have seen...
Now the question is why does the computer do that ? The reason is the computer compute in base 2 (binary) and that we usually compute in base 10 (decimal). The results are the same (thanks god !) but the internal storage and the representation are not.
Some numbers looks nice when displayed in base 10, like 1.3 for example, but others don't, for example 1/3 = 0.333333333.... It's the same in base 2, some numbers "looks" nice in base 2 (usually when composed of fractions of 2) and other not. When the computer stores number internally, it may not be able to store it "exactly" and store the closest possible representation, even if the number looked "finite" in decimal. So yes, in this case, it "drifts" a little bit. If you do that again and again, you may lose precision. But there is no other way (unless using special math libs able to store fractions)
The problem arise when the computer tries to give you back in base 10 the number you gave it. Then the computer may gives you 1.299999 instead of the 1.3 you were expected.
That's also the reason why you should never compare floats with ==, <, >, but instead use the special functions islessgreater(a, b) isgreater(a, b) etc.
So the actual function you use (sprintf) is fine and as exact as it can, it gives you correct values, you just have to know that when dealing with floats, 1.2999999 at maximum precision is OK if you were expecting 1.3
Now if you want to "pretty print" those numbers to have the best "human" representation (base 10), you may want to use a special library, like your grisu3 which will try to undo the drift that may have happen and align the number to the closest base 10 representation.
Now the library cannot use a crystal ball and find what numbers were drifted or not, so it may happen that you really meant 1.2999999 at maximum precision as stored in the computer and the lib will "convert" it to 1.3... But it's not worse nor less precise than displaying 1.29999 instead of 1.3.
If you need a good readability, such lib will be useful. If not, it's just a waste of time.
Hope this help !
The best way to do this in any reasonable language is:
Use your language's runtime library. Don't ever roll your own. Even if you have the knowledge and curiosity to write it, you don't want to test it and you don't want to maintain it.
If you notice any misbehavior from the runtime library conversion, file a bug.
If these conversions are a measurable bottleneck for your program, don't try to make them faster. Instead, find a way to avoid doing them at all. Instead of storing numbers as strings, just store the floating-point data (after possibly controlling for endianness). If you need a string representation, use a hexadecimal floating-point format instead.
I don't mean to discourage you, or anyone. These are actually fascinating functions to work on, but they are also shocking complex, and trying to design good test coverage for any non-naive implementation is even more involved. Don't get started unless you're prepared to spend months thinking about the problem.
You might want to use something like Grisu (or a faster method) because it gives you the shortest decimal representation with round trip guarantee unlike sprintf which only takes a fixed precision. The good news is that C++20 includes std::format that gives you this by default. For example:
printf("%.*g", std::numeric_limits<double>::max_digits10, 0.3);
prints 0.29999999999999999 while
puts(fmt::format("{}", 0.3).c_str());
prints 0.3 (godbolt).
In the meantime you can use the {fmt} library, std::format is based on. {fmt} also provides the print function that makes this even easier and more efficient (godbolt):
fmt::print("{}", 0.3);
Disclaimer: I'm the author of {fmt} and C++20 std::format.
In C++ why aren't you using iostreams? You should probably be using cout for the console and ostringstream for string-oriented output (unless you have a very specific need to use a printf family method).
You shouldn't worry about formatting performance unless actual profiling shows that CPU is the bottleneck (compared to say I/O).
void outputdouble( ostringstream & oss, double d )
{
oss.precision( 5 );
oss << d;
}
http://www.cplusplus.com/reference/iostream/ostringstream/

byte representation of numbers

I use the next way to get the byte representation of numbers in C++:
template< class T >
union u_value
{
T value;
unsigned char bytes[ sizeof( T ) ];
}
Please, tell me is that the true way? And if not why and how should I get it?
There is no "true" way. What you did is one way to do it. Generally speaking, stuff like this is discouraged as it usually results in non-portable code. Sometimes there are good reasons for poking internals like that, but since you didn't tell what you're about to do with the "byte representation", there's little we can do to judge if this approach is appropriate.
Edit: So networking is your subject here. In this case, either:
You are transferring POD types only (char, short, int, the likes). In this case, you might want to look into <netinet/in.h>, where you'll find the macros htons(), htonl(), ntohs() and ntohl(), which do host-to-network-byte-order and vice versa for you.
You are transferring complex types (e.g. classes). In this case, you might want to look into Boost Serialization, because there's much more to be considered here than mere byte order.
Either way, it is advisable to use ready-made, well-documented and -understood code, instead of doing byte-juggling yourself.
Not true way. This way is not portable and often may result in undefined behavior. Because,
Padding (done by compiler) is ignored
virtual function extra space is not taken care visibly (if T is polymorphic)
If you are transferring the bytes across the network you need to be careful about the Endianess
This will work. Just be sure that anything with virtual functions, pointers and so on won't be accurate next time you deserialize them.

Testing buffer overrun of input

For example, if i input characters greater than 10 why doesn't it throw an exception or error?
would you get the input with getline instead?
int main()
{
char c[10];
while (cin >> c)
{
cout << c << endl;
}
}
Why doesn't it throw an exception or error?
A buffer overflow is an example of undefined behavior. The behavior is literally undefined: if you overflow a buffer, there are no guarantees whatosever about what your program will do. This doesn't generate an exception because doing so would require lots of relatively costly checks even in correct code, and in C++ the general philosophy is that you don't pay for what you don't need.
If you avoid raw arrays and raw (non-smart) pointers and use the C++ Standard Library containers, strings, and algorithms, you can easily avoid most situations that would result in a buffer overflow.
Would you get the input with getline instead?
You can either use std::getline, which allows you to extract a "line" of characters into a std::string, or you can use >> and extract into a std::string object directly, depending on what, exactly, you want to extract.
there are tools which attempt to expose these issues. valgrind and GuardMalloc are examples of this. as well, msc allows you to specify build options which can expose such issues.
note also that different compilers emit different instructions based on your program, and different instructions when optimizing or not. this means the consequences may exist in some builds, and may not exist in others.
i occasionally test my programs using the tools/techniques i've mentioned. i also use more dynamic allocations in unit tests, in order to expose failure cases more easily when running programs with these tools.
if you're coming from java or another language which integrates smarter arrays: that's not how c programs are interpreted by the compiler, nor is it how they are represented in memory. instead, we typically use proper containers in c++. these will detect many of these issues. for example, a std::vector may throw if you attempt to access an invalid element.
good luck

Safe counterparts of itoa()?

I am converting some old c program to a more secure version. The following functions are used heavily, could anyone tell me their secure counterparts? Either windows functions or C runtime library functions. Thanks.
itoa()
getchar()
strcat()
memset()
itoa() is safe as long as the destination buffer is big enough to receive the largest possible representation (i.e. of INT_MIN with trailing NUL). So, you can simply check the buffer size. Still, it's not a very good function to use because if you change your data type to a larger integral type, you need to change to atol, atoll, atoq etc.. If you want a dynamic buffer that handles whatever type you throw at it with less maintenance issues, consider an std::ostringstream (from the <sstream> header).
getchar() has no "secure counterpart" - it's not insecure to begin with and has no buffer overrun potential.
Re memset(): it's dangerous in that it accepts the programmers judgement that memory should be overwritten without any confirmation of the content/address/length, but when used properly it leaves no issue, and sometimes it's the best tool for the job even in modern C++ programming. To check security issues with this, you need to inspect the code and ensure it's aimed at a suitable buffer or object to be 0ed, and that the length is computed properly (hint: use sizeof where possible).
strcat() can be dangerous if the strings being concatenated aren't known to fit into the destination buffer. For example: char buf[16]; strcpy(buf, "one,"); strcat(buf, "two"); is all totally safe (but fragile, as further operations or changing either string might require more than 16 chars and the compiler won't warn you), whereas strcat(buf, argv[0]) is not. The best replacement tends to be a std::ostringstream, although that can require significant reworking of the code. You may get away using strncat(), or even - if you have it - asprintf("%s%s", first, second), which will allocate the required amount of memory on the heap (do remember to free() it). You could also consider std::string and use operator+ to concatenate strings.
None of these functions are "insecure" provided you understand the behaviour and limitations. itoa is not standard C and should be replaced with sprintf("%d",...) if that's a concern to you.
The others are all fine to the experienced practitioner. If you have specific cases which you think may be unsafe, you should post them.
I'd change itoa(), because it's not standard, with sprintf or, better, snprintf if your goal is code security. I'd also change strcat() with strncat() but, since you specified C++ language too, a really better idea would be to use std::string class.
As for the other two functions, I can't see how you could make the code more secure without seeing your code.