Will std::string end up being our compile-time string after all? - c++

Many developers and library authors have been struggling with compile-time strings for quite a few years now - as the standard (library) string, std::string, requires dynamic memory allocation, and isn't constexpr.
So we have a bunch of questions and blog posts about how to get compile-time strings right:
Conveniently Declaring Compile-Time Strings in C++
Concatenate compile-time strings in a template at compile time?
C++ Compile-Time string manipulation
(off-site) Compile-time strings with constexpr
We've now learned that not only is new available in constexpr code, allowing for dynamical allocation at compile-time, but, in fact, std::string will become constexpr in C++20 (C++ standard working group meeting report by Herb Sutter).
Does that mean that for C++20-and-up code we should chuck all of those nifty compile-time string implementations and just always go with std::string?
If not - when would we do so, and when would we stick to what's possible today (other than backwards-compatible code of course)?
Note: I'm not talking about strings whose contents is part of their type, i.e. not talking about the equivalent of std::integral_constant; that's definitely not going to be std::string.

It depends on what you mean by "constexpr string".
What C++20 allows you to do is to use std::string within a function marked constexpr (or consteval). Such a function can create a string, manipulate it, and so forth just like any literal type. However, that string cannot leak out into non-constexpr code; that would be a non-transient allocation and is forbidden.
The thing is, all of the examples you give are attempts to use strings as template parameters. That's a similar-yet-different thing. You're not just talking about building a string at compile-time; you now want to use it to instantiate a template.
C++20 solves this problem by allowing user-defined types to be template parameters. But the requirements on such types are much more strict than merely being literal types. The type must have no non-public data members and the only members are of types that follow those restrictions. Basically, the compiler needs to know that a byte-wise comparison of its data members represents an equivalent value. And even a constexpr-capable std::string doesn't work that way.
But std::array<char, N> can do that. And if you are in constexpr code, call a constexpr function which returns a std::string, and store that string in a constexpr value, then string::size() is a constexpr function. So you can use that to fill in the N for your array.
Copying the characters into a constexpr array (since it's a constexpr value, it's immutable) is a bit more involved, but it's doable.
So C++20 solves those problem, just not (directly) with std::string.

Related

Why is there no overload for printing `std::byte`?

The following code does not compile in C++20
#include <iostream>
#include <cstddef>
int main(){
std::byte b {65};
std::cout<<"byte: "<<b<<'\n';// Missing overload
}
When std::byte was added in C++17, why was there no corresponding operator<< overloading for printing it? I can maybe understand the choice of not printing containers, but why not std::byte? It tries to act as primitive type and we even have overloads for std::string, the recent std::string_view, and perhaps the most related std::complex, and std::bitset itself can be printed.
There are also std::hex and similar modifiers, so printing 0-255 by default should not be an issue.
Was this just oversight? What about operator>>, std::bitset has it and it is not trivial at all.
EDIT: Found out even std::bitset can be printed.
From the paper on std::byte (P0298R3): (emphasis mine)
Design Decisions
std::byte is not an integer and not a character
The key motivation here is to make byte a distinct type – to improve program safety by leveraging the type system. This leads to the design that std::byte is not an integer type, nor a character type. It is a distinct
type for accessing the bits that ultimately make up object storage.
As such, it is not required to be implicitly convertible/interpreted to be either a char or any integral type whatsoever and hence cannot be printed using std::cout unless explicitly cast to the required type.
Furthermore, this question might help.
std::byte is intended for accessing raw data. To allow me to replace that damn uint8_t sprinkled all over the codebase with something that actually says "this is raw and unparsed", instead of something that could be misunderstood as a C string.
To underline: std::byte doesn't "try to be a primitive", it represents something even less - raw data.
That it's implemented like this is mostly a quirk of C++ and compiler implementations (layout rules for "primitive" types are much simpler than for a struct or a class).
This kind of thing is mostly found in low level code where, honestly, printing shouldn't be used. Isn't possible sometimes.
My use case, for example, is receiving raw bytes over I2C (or RS485) and parsing them into frame which is then put into a struct. Why would I want to serialize raw bytes over actual data? Data I will have access to almost immediately?
To sum up this somewhat ranty answer, providing operator overloads for std::byte to work with iostream goes against the intent of this type.
And expressing intent in code as much as possible is one of important principles in modern programming.

Avoiding the func(char *) api on embedded

Note:
I heavily changed my question to be more specific, but I will keep the old question at end of the post, in case it is useful to anyone.
New Question
I am developing an embedded application which uses the following types to represent strings :
string literals(null terminated by default)
std::array<char,size> (not null terminated)
std::string_view
I would like to have a function that accepts all of them in a uniform way. The only problem is that if the input is a string literal I will have to count the size with strlen that in both other cases doesn't work but if I use size it will not work on case 1.
Should I use a variant like so: std::variant<const char *,std::span<char>> ? Would that be heavy by forcing myself to use std::visit ? Would that thing even match correctly all the different representations of strings?
Old Question
Disclaimer when I refer to "string" in the following context I don't mean an std::string but just an abstract way to say alphanumeric series.
Most of the cases when I have to deal with strings in c++ I use something like void func(const std::string &); or without the const and the reference at some cases.Now on an embedded app I don't have access to std::string and I tried to use std::string_view the problem is that std::string_view when constructed from a non literal sometimes is not null terminated
Edit: I changed the question a bit as the comments implied some very helphull hints .
So even though y has a size in the example below:
std::array<char,5> x{"aa"} ;
std::string_view y(x.data());
I can't use y with a c api like printf(%s,y.data()) that is based on null termination
#include <array>
#include <string_view>
#include "stdio.h"
int main(){
std::array<char,5> x{"aaa"};
std::string_view y(x.data());
printf("%s",x);
}
To summarize:
What can I do to implement a stack allocated string that implicitly gets a static size at its constructors (from null terminated strings,string literals, string_views and std::arrays) and it is movable (or cheap copyable)?
What would be the underlying type of my class? What would be the speed costs in comparison with the underlying type?
I think that you are looking at two largely and three subtly different semantics of char*.
Yes, all of them point at char but the type-specific info on how to determine the length is not carried by that. Even in the ancient ancestor of C++ (not saying C...) a pointer to char was not always the same. Already there pointers to terminated and non-terminated sequences of characters could not be mixed.
In C++ the tool of overloading a function exists and it seems to be the obvious solution for your problem. You can still implement that efficiently with only one (helper) function doing the actual work, based on an explicit size information in a second parameter.
Overload the function which is "visible" on the API, with three versions for the three types. Have it determine the length in the appropriate way, then call the single helper function, providing that length.

is std::string now usable as compile time constant? [duplicate]

Many developers and library authors have been struggling with compile-time strings for quite a few years now - as the standard (library) string, std::string, requires dynamic memory allocation, and isn't constexpr.
So we have a bunch of questions and blog posts about how to get compile-time strings right:
Conveniently Declaring Compile-Time Strings in C++
Concatenate compile-time strings in a template at compile time?
C++ Compile-Time string manipulation
(off-site) Compile-time strings with constexpr
We've now learned that not only is new available in constexpr code, allowing for dynamical allocation at compile-time, but, in fact, std::string will become constexpr in C++20 (C++ standard working group meeting report by Herb Sutter).
Does that mean that for C++20-and-up code we should chuck all of those nifty compile-time string implementations and just always go with std::string?
If not - when would we do so, and when would we stick to what's possible today (other than backwards-compatible code of course)?
Note: I'm not talking about strings whose contents is part of their type, i.e. not talking about the equivalent of std::integral_constant; that's definitely not going to be std::string.
It depends on what you mean by "constexpr string".
What C++20 allows you to do is to use std::string within a function marked constexpr (or consteval). Such a function can create a string, manipulate it, and so forth just like any literal type. However, that string cannot leak out into non-constexpr code; that would be a non-transient allocation and is forbidden.
The thing is, all of the examples you give are attempts to use strings as template parameters. That's a similar-yet-different thing. You're not just talking about building a string at compile-time; you now want to use it to instantiate a template.
C++20 solves this problem by allowing user-defined types to be template parameters. But the requirements on such types are much more strict than merely being literal types. The type must have no non-public data members and the only members are of types that follow those restrictions. Basically, the compiler needs to know that a byte-wise comparison of its data members represents an equivalent value. And even a constexpr-capable std::string doesn't work that way.
But std::array<char, N> can do that. And if you are in constexpr code, call a constexpr function which returns a std::string, and store that string in a constexpr value, then string::size() is a constexpr function. So you can use that to fill in the N for your array.
Copying the characters into a constexpr array (since it's a constexpr value, it's immutable) is a bit more involved, but it's doable.
So C++20 solves those problem, just not (directly) with std::string.

C++ - Significance of local names for types

I recently read up how classes are allowed to define their own local names for types. One of the famous examples being size_type, provided almost by all STL containers. It was also mentioned that doing so helps hide implementation details from the user of the class. I am not quite sure how this is the case.
What are some examples where defining local names for types might be useful and how doing so hides implementation details?
Please provide some examples where defining local names for types might be useful and how it hides implementation details.
its more usefull when you use templated algorithms or containers, which might assume that your type has such type alias. So even if you modify type for size_type - i.e. change for some reason from size_t to int, then your type will still work with those algorithms / containers.
Otherwise, presence of size_type are required by standard when you for example implement your own allocator.
Suppose you have a program where you define several variables of type size_type and that it is defined somewhere as an int.
Then, upon analysis and reflection, you realize that the variables never assume values igger than 10 thousand. Therefore, the 32 bits used to allocate each of these variables are somewhate an overkill. In this case, you can redefine size_type as being of short type, instead of int. Therefore you will end up saving some memory.
Regarding the examples, you can check clock_t, char16_t, char32_t, wchar_t, true_type and false_type.

Why isn't std::hash<T> specialized for char*?

Why doesn't the C++ standard specify that std::hash<T> is specialized for char*, const char*, unsigned char*, const unsigned char*, etc? I.e., it would hash the contents of the C string until the terminating null is found.
Any harm in injecting my own specializations into the std namespace for my own code?
Why doesn't the C++ standard specify that std::hash<T> is specialized for char*, const char*, unsigned char*, const unsigned char*, etc?
It looks like it originated from proposal N1456. (emphasis mine)
Some earlier hash table implementations gave char* special treatment: it specialized the default hash function to look at character array being pointed to, rather than the pointer itself. This proposal removes that special treatment. Special treatment makes it slightly easier to use hash tables for C string, but at the cost of removing uniformity and making it harder to write generic code. Since naive users would generally be expected to use std::basic_string instead of C strings, the cost of special treatment outweighs the benefit.
If I'm interpreting this correctly, the reasoning is that supporting C style strings would break code that generically acts on hashes of pointers.
Any harm in injecting my own specializations into the std namespace for my own code?
There is potential harm, yes.
In the future, anything you added to the std namespace could collide with a new symbol name.
In the present, anything you add to the std namespace could be a "better match" for other components of the standard library, silently breaking behavior.
char* (and its ilk) doesn't always mean string. They can be simple byte arrays or binary file dumps or any number of other things. If you mean string in C++, you generally use the "string" class.
As for creating your own, given the above it's a bad idea. For user defined types, though, it is acceptable to create specializations of the std:: functions in the std:: namespace.
There is a standard specialization for pointer types, see here
template< class T > struct hash<T*>;
So, it can cover char* (as sequence of bytes not a C-style string) too.
If you mean a specialization for C-style strings, there's not technically a problem to implement that. But since there is a specialization for std::string in C++, it's not worth to have a specialization for C-style strings.
For second part of your question, you can inject everything in std namespace but, what do you gain? It's against the goal of namespaces. Have your own namespace territory.