Avoiding the func(char *) api on embedded - c++

Note:
I heavily changed my question to be more specific, but I will keep the old question at end of the post, in case it is useful to anyone.
New Question
I am developing an embedded application which uses the following types to represent strings :
string literals(null terminated by default)
std::array<char,size> (not null terminated)
std::string_view
I would like to have a function that accepts all of them in a uniform way. The only problem is that if the input is a string literal I will have to count the size with strlen that in both other cases doesn't work but if I use size it will not work on case 1.
Should I use a variant like so: std::variant<const char *,std::span<char>> ? Would that be heavy by forcing myself to use std::visit ? Would that thing even match correctly all the different representations of strings?
Old Question
Disclaimer when I refer to "string" in the following context I don't mean an std::string but just an abstract way to say alphanumeric series.
Most of the cases when I have to deal with strings in c++ I use something like void func(const std::string &); or without the const and the reference at some cases.Now on an embedded app I don't have access to std::string and I tried to use std::string_view the problem is that std::string_view when constructed from a non literal sometimes is not null terminated
Edit: I changed the question a bit as the comments implied some very helphull hints .
So even though y has a size in the example below:
std::array<char,5> x{"aa"} ;
std::string_view y(x.data());
I can't use y with a c api like printf(%s,y.data()) that is based on null termination
#include <array>
#include <string_view>
#include "stdio.h"
int main(){
std::array<char,5> x{"aaa"};
std::string_view y(x.data());
printf("%s",x);
}
To summarize:
What can I do to implement a stack allocated string that implicitly gets a static size at its constructors (from null terminated strings,string literals, string_views and std::arrays) and it is movable (or cheap copyable)?
What would be the underlying type of my class? What would be the speed costs in comparison with the underlying type?

I think that you are looking at two largely and three subtly different semantics of char*.
Yes, all of them point at char but the type-specific info on how to determine the length is not carried by that. Even in the ancient ancestor of C++ (not saying C...) a pointer to char was not always the same. Already there pointers to terminated and non-terminated sequences of characters could not be mixed.
In C++ the tool of overloading a function exists and it seems to be the obvious solution for your problem. You can still implement that efficiently with only one (helper) function doing the actual work, based on an explicit size information in a second parameter.
Overload the function which is "visible" on the API, with three versions for the three types. Have it determine the length in the appropriate way, then call the single helper function, providing that length.

Related

is std::string now usable as compile time constant? [duplicate]

Many developers and library authors have been struggling with compile-time strings for quite a few years now - as the standard (library) string, std::string, requires dynamic memory allocation, and isn't constexpr.
So we have a bunch of questions and blog posts about how to get compile-time strings right:
Conveniently Declaring Compile-Time Strings in C++
Concatenate compile-time strings in a template at compile time?
C++ Compile-Time string manipulation
(off-site) Compile-time strings with constexpr
We've now learned that not only is new available in constexpr code, allowing for dynamical allocation at compile-time, but, in fact, std::string will become constexpr in C++20 (C++ standard working group meeting report by Herb Sutter).
Does that mean that for C++20-and-up code we should chuck all of those nifty compile-time string implementations and just always go with std::string?
If not - when would we do so, and when would we stick to what's possible today (other than backwards-compatible code of course)?
Note: I'm not talking about strings whose contents is part of their type, i.e. not talking about the equivalent of std::integral_constant; that's definitely not going to be std::string.
It depends on what you mean by "constexpr string".
What C++20 allows you to do is to use std::string within a function marked constexpr (or consteval). Such a function can create a string, manipulate it, and so forth just like any literal type. However, that string cannot leak out into non-constexpr code; that would be a non-transient allocation and is forbidden.
The thing is, all of the examples you give are attempts to use strings as template parameters. That's a similar-yet-different thing. You're not just talking about building a string at compile-time; you now want to use it to instantiate a template.
C++20 solves this problem by allowing user-defined types to be template parameters. But the requirements on such types are much more strict than merely being literal types. The type must have no non-public data members and the only members are of types that follow those restrictions. Basically, the compiler needs to know that a byte-wise comparison of its data members represents an equivalent value. And even a constexpr-capable std::string doesn't work that way.
But std::array<char, N> can do that. And if you are in constexpr code, call a constexpr function which returns a std::string, and store that string in a constexpr value, then string::size() is a constexpr function. So you can use that to fill in the N for your array.
Copying the characters into a constexpr array (since it's a constexpr value, it's immutable) is a bit more involved, but it's doable.
So C++20 solves those problem, just not (directly) with std::string.

Will std::string end up being our compile-time string after all?

Many developers and library authors have been struggling with compile-time strings for quite a few years now - as the standard (library) string, std::string, requires dynamic memory allocation, and isn't constexpr.
So we have a bunch of questions and blog posts about how to get compile-time strings right:
Conveniently Declaring Compile-Time Strings in C++
Concatenate compile-time strings in a template at compile time?
C++ Compile-Time string manipulation
(off-site) Compile-time strings with constexpr
We've now learned that not only is new available in constexpr code, allowing for dynamical allocation at compile-time, but, in fact, std::string will become constexpr in C++20 (C++ standard working group meeting report by Herb Sutter).
Does that mean that for C++20-and-up code we should chuck all of those nifty compile-time string implementations and just always go with std::string?
If not - when would we do so, and when would we stick to what's possible today (other than backwards-compatible code of course)?
Note: I'm not talking about strings whose contents is part of their type, i.e. not talking about the equivalent of std::integral_constant; that's definitely not going to be std::string.
It depends on what you mean by "constexpr string".
What C++20 allows you to do is to use std::string within a function marked constexpr (or consteval). Such a function can create a string, manipulate it, and so forth just like any literal type. However, that string cannot leak out into non-constexpr code; that would be a non-transient allocation and is forbidden.
The thing is, all of the examples you give are attempts to use strings as template parameters. That's a similar-yet-different thing. You're not just talking about building a string at compile-time; you now want to use it to instantiate a template.
C++20 solves this problem by allowing user-defined types to be template parameters. But the requirements on such types are much more strict than merely being literal types. The type must have no non-public data members and the only members are of types that follow those restrictions. Basically, the compiler needs to know that a byte-wise comparison of its data members represents an equivalent value. And even a constexpr-capable std::string doesn't work that way.
But std::array<char, N> can do that. And if you are in constexpr code, call a constexpr function which returns a std::string, and store that string in a constexpr value, then string::size() is a constexpr function. So you can use that to fill in the N for your array.
Copying the characters into a constexpr array (since it's a constexpr value, it's immutable) is a bit more involved, but it's doable.
So C++20 solves those problem, just not (directly) with std::string.

Why is the `std::sto`... series not a template?

I wonder if there is a reason why the std::sto series (e.g. std::stoi, std::stol) is not a function template, like that:
template<typename T>
T sto(std::string const & str, std::size_t *pos = 0, int base = 10);
and then:
template<>
int sto<int>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
template<>
long sto<long>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
/* etc. */
In my sense, that would be a better design, because for the moment, when I have to convert a string in whatever numerical value an user want, I have to manually manage each case.
Is there a reason to not have such a template function? Is there an assumed choice, or is this just done like that?
Looking at the description of these functions at cppref, I note the following:
... Interprets a signed integer value in the string str.
1) calls std::strtol(str.c_str(), &ptr, base)...
and strol a "C" standard function that's also available in C++.
Reading further, we see: (for the c++ sto* functions):
Return value
The string converted to the specified signed integer type.
Exceptions
std::invalid_argument if no conversion could be performed
std::out_of_range if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or
std::strtoll) sets errno to ERANGE.
So while I have no original source for this, and indeed have never worked with these functions, I would guess that:
TL;DR : These functions are C++-ish wrappers around already existing C/C++ functions -- strtol* -- so they resemble these functions as close as possible.
I have to manage manually each case. Is there a reason to not have such a template function?
In case of such questions, Eric Lippert (C#) usually says something along the lines:
If a feature is missing, then it's missing because noone implemented it yet. And that's because either noone else earlier wanted yet, or because it was considered not worth the effort, or because it couldn't have been finished before publishing the current release".
Here, I guess it's the "not worth" part, but I have neither asked the commitee about, nor managed to find any answer in old questions and faqs. I didn't spend much time searching though.
I say this because I suppose that most common of these functions' functionality (if not all of) is already contained in stream classes, like istringstream. Just like cin/etc, this one also has an all-having operator >>, overloaded for all base numeric types (and more).
Furthermore, the stream manipulators like std::hex (std::setbase) already solve the problem of passing various type-dependent configuration parameters to the actual conversion functions. No problems with mixed function signatures (like those mentioned by DavidHaim in his answer). Here's just a single operator>>.
So.. since if we have it in streams, if we already can read numbers/etc from strings with simple foo >> bar >> setbase(42) >> baz >> ..., then I think it was not worth the effort to add more complicated layers to old C runtime functions.
No proof for that though. Just a hunch.
The problem with template specialization is that the specialization requires you to match the original template function signature, so each specialization must implement the interface of (string,pos,base).
If you would like to have some other type which does not follows this interface, you are in trouble.
Suppose that, in the future, we would like to have sto<std::pair<int,int>>. We will want to have pos and base for the first and the second stringified integer. we would like the signature to be in the form of string,pos1,base1,pos2,base2. Since sto signature is already set, we cannot do it.
You can always wrap std::sto* in your implementation of sto for integral types, but you cannot do that the other way around.
The purpose of these functions is to provide simple conversions for common cases. They are not intended as a general-purpose conversion suite. std::ostringstream is much better for that kind of thing.
In my sense, there would be a better design, because for the moment,
when I have to convert a string in whatever numerical value an user
want, I have to manage manually each case.
No, it would not. Templates goal (deliberately setting T-MP apart) is not to replace overloading; you should always prefer overloading to templates. Actually, it's something the language already does for you! Between a candidate function and a possible template instantation, the former will be prefered. Using language features for the sake of it is bad.
I don't see how templates could help either. Whatever type the user decides to input, it won't be known till runtime, and template types are deduced at compile time. C++ is a statically typed language. In this case, templates will just add an unneeded layer of complexity over normal function overloading.

How string accepting interface should look like?

This is a follow up of this question. Suppose I write a C++ interface that accepts or returns a const string. I can use a const char* zero-terminated string:
void f(const char* str); // (1)
The other way would be to use an std::string:
void f(const string& str); // (2)
It's also possible to write an overload and accept both:
void f(const char* str); // (3)
void f(const string& str);
Or even a template in conjunction with boost string algorithms:
template<class Range> void f(const Range& str); // (4)
My thoughts are:
(1) is not C++ish and may be less efficient when subsequent operations may need to know the string length.
(2) is bad because now f("long very long C string"); invokes a construction of std::string which involves a heap allocation. If f uses that string just to pass it to some low-level interface that expects a C-string (like fopen) then it is just a waste of resources.
(3) causes code duplication. Although one f can call the other depending on what is the most efficient implementation. However we can't overload based on return type, like in case of std::exception::what() that returns a const char*.
(4) doesn't work with separate compilation and may cause even larger code bloat.
Choosing between (1) and (2) based on what's needed by the implementation is, well, leaking an implementation detail to the interface.
The question is: what is the preffered way? Is there any single guideline I can follow? What's your experience?
Edit: There is also a fifth option:
void f(boost::iterator_range<const char*> str); // (5)
which has the pros of (1) (doesn't need to construct a string object) and (2) (the size of the string is explicitly passed to the function).
If you are dealing with a pure C++ code base, then I would go with #2, and not worry about callers of the function that don't use it with a std::string until a problem arises. As always, don't worry too much about optimization unless there is a problem. Make your code clean, easy to read, and easy to extend.
There is a single guideline you can follow: use (2) unless you have very good reasons not to.
A const char* str as parameter does not make it explicit, what operations are allowed to be performed on str. How often can it be incremented before it segfaults? Is it a pointer to a char, an array of chars or a C string (i.e. a zero-terminated array of char)?
I don't really have a single hard preference. Depending on circumstances, I alternate between most of your examples.
Another option I sometimes use is similar to your Range example, but using plain old iterator ranges:
template <typename Iter>
void f(Iter first, Iter last);
which has the nice property that it works easily with both C-style strings (and allows the callee to determine the length of the string in constant time) as well as std::string.
If templates are problematic (perhaps because I don't want the function to be defined in a header), I sometimes do the same, but using char* as iterators:
void f(const char* first, const char* last);
Again, it can be trivially used with both C-strings and C++ std::string (as I recall, C++03 doesn't explicitly require strings to be contiguous, but every implementation I know of uses contiguously allocated strings, and I believe C++0x will explicitly require it).
So these versions both allow me to convey more information than the plain C-style const char* parameter (which loses information about the string length, and doesn't handle embedded nulls), in addition to supporting both of the major string types (and probably any other string class you can think of) in an idiomatic way.
The downside is of course that you end up with an additional parameter.
Unfortunately, string handling isn't really C++'s strongest side, so I don't think there is a single "best" approach. But the iterator pair is one of several approaches I tend to use.
For taking a parameter I would go with whatever is simplest and often that is const char*. This works with string literals with zero cost and retrieving a const char* from something stored in a std:string is typically very low cost as well.
Personally, I wouldn't bother with the overload. In all but the simplest cases you will want to merge to two code paths and have one call the other at some point or both call a common function. It could be argued that having the overload hides whether one is converted to the other or not and which path has a higher cost.
Only if I actually wanted to use const features of the std::string interface inside the function would I have const std::string& in the interface itself and I'm not sure that just using size() would be enough of a justification.
In many projects, for better or worse, alternative string classes are often used. Many of these, like std::string give cheap access to a zero-terminated const char*; converting to a std::string requires a copy. Requiring a const std::string& in the interface is dictating a storage strategy even when the internals of the function don't need to specify this. I consider it this to be undesirable, much like taking a const shared_ptr<X>& dictates a storage strategy whereas taking X&, if possible, allows the caller to use any storage strategy for a passed object.
The disadvantages of a const char* are that, purely from an interface standpoint, it doesn't enforce non-nullness (although very occasionally the difference betweem a null parameter and an empty string is used in some interfaces - this can't be done with std::string), and a const char* might be the address of just a single character. In practice, though, the use of a const char* to pass a string is so prevalent that I would consider citing this as a negative to be a fairly trivial concern. Other concerns, such as whether the encoding of the characters specified in the interface documentation (applies to both std::string and const char*) are much more important and likely to cause more work.
The answer should depend heavily on what you are intending to do in f. If you need to do some complex processing with the string, the approach 2 makes sense, if you simply need to pass to some other functions, then select based on those other functions (let's say for arguments sake you are opening a file - what would make most sense? ;) )
It's also possible to write an
overload and accept both:
void f(const string& str) already accepts both because of the implicit conversion from const char* to std::string. So #3 has little advantage over #2.
I would choose void f(const string& str) if the function body does not do char-analysis; means it's not referring to char* of str.
Use (2).
The first stated problem with it is not an issue, because the string has to be created at some point regardless.
Fretting over the second point smells of premature optimization. Unless you have a specific circumstance where the heap allocation is problematic, such as repeated invocations with string literals, and those cannot be changed, then it is better to favor clarity over avoiding this pitfall. Then and only then might you consider option (3).
(2) clearly communicates what the function accepts, and has the right restrictions.
Of course, all 5 are improvements over foo(char*) which I have encountered more than I would care to mention.

operator char* in STL string class

Why doesn't the STL string class have an overloaded char* operator built-in? Is there any specific reason for them to avoid it?
If there was one, then using the string class with C functions would become much more convenient.
I would like to know your views.
Following is the quote from Josuttis STL book:
However, there is no automatic type
conversion from a string object to a
C-string. This is for safety reasons
to prevent unintended type conversions
that result in strange behavior (type
char* often has strange behavior) and
ambiguities (for example, in an
expression that combines a string and
a C-string it would be possible to
convert string into char* and vice
versa). Instead, there are several
ways to create or write/copy in a
C-string, In particular, c_str() is
provided to generate the value of a
string as a C-string (as a character
array that has '\0' as its last
character).
You should always avoid cast operators, as they tend to introduce ambiguities into your code that can only be resolved with the use of further casts, or worse still compile but don't do what you expect. A char*() operator would have lots of problems. For example:
string s = "hello";
strcpy( s, "some more text" );
would compile without a warning, but clobber the string.
A const version would be possible, but as strings must (possibly) be copied in order to implement it, it would have an undesirable hidden cost. The explicit c_str() function means you must always state that you really intend to use a const char *.
The string template specification deliberately allows for a "disconnected" representation of strings, where the entire string contents is made up of multiple chunks. Such a representation doesn't allow for easy conversion to char*.
However, the string template also provides the c_str method for precisely the purpose you want: what's wrong with using that method?
By 1998-2002 it was hot topic of c++ forums. The main problem - zero terminator. Spec of std::?string allows zero character as normal, but char* string doesn't.
You can use c_str instead:
string s("I like rice!");
const char* cstr = s.c_str();
I believe that in most cases you don't need the char*, and can work more conveniently with the string class itself.
If you need interop with C-style functions, using a std::vector<char> / <wchar_t> is often easier.
It's not as convenient, and unfortunately you can't O(1)-swap it with a std::string (now that would be a nice thing).
In that respect, I much prefer the interface of MFC/ATL CString which has stricter performance guarantees, provides interop, and doesn't treat wide character/unicode strings as totally foreign (but ok, the latter is somewhat platform specific).