C++ constexpr std::array of string literals

C++ constexpr std::array of string literals - c++

I've been happily using the following style of constant string literals in my code for awhile, without really understanding how it works:
constexpr std::array myStrings = { "one", "two", "three" };
This may seem trivial, but I'm hazy on the details of what is going on under the hood. From my understanding, class template argument deduction (CTAD) is used to construct an array of the appropriate size and element type. My questions would be:
What is the element type of the std::array in this case, or is this implementation specific? Looking at the debugger (I'm using Microsoft C++), the elements are just pointers to non-contiguous locations.
Is it safe to declare constexpr arrays of string literals in this way?
I could do this instead, but it's not as tidy:
const std::array<std::string, 3> myOtherStrings = { "one", "two", "three" };

Yes, this is CTAD deducing the template arguments for you. (since C++17)
std::array has a deduction guide which enables CTAD with this form of initializer.
It will deduce the type of myStrings to
const std::array<const char*, 3>
The const char* is the result of usual array-to-pointer decay being applied to the elements of the initializer list (which are arrays of const chars).
const in front is a consequence of constexpr.
Each element of the array will point to the corresponding string literal.
constexpr is safe and you can use the array elements as you would individual string literals via const char* pointer. In particular trying to modify these literals or the array via const_cast will have undefined behavior though.
const std::array<std::string, 3> also works, but will not be usable in constant expressions. constexpr is not allowed on this because of std:string.
CTAD can also be used to deduce this type though with the help of string literal operators:
#include<string>
using namespace std::string_literals;
//...
const std::array myOtherStrings = { "one"s, "two"s, "three"s };
or since C++20:
const auto myOtherStrings = std::to_array<std::string>({ "one", "two", "three" });

As user17732522 already noted, the type deduction for your original code produces a const std::array<const char*, 3>. This works, but it's not a C++ std::string, so every use needs to scan for the NUL terminator, and they can't contain embedded NULs. I just wanted to emphasize the suggestion from my comment to use std::string_view.
Since std::string inherently relies on run-time memory allocation, you can't use it unless the entirety of the associated code is also constexpr (so no actual strings exist at all at run-time, the compiler computes the final result at compile-time), and that's unlikely to help you here if the goal is to avoid unnecessary runtime work for something that is partially known at compile time (especially if the array gets recreated on each function call; it's not global or static, so it's done many times, not just initialized once before use).
That said, if you can rely on C++17, you can split the difference with std::string_view. It's got a very concise literal form (add sv as a prefix to any string literal), and it's fully constexpr, so by doing:
// Top of file
#include <string_view>
// Use one of your choice:
using namespace std::literals; // Enables all literals
using namespace std::string_view_literals; // Enables sv suffix only
using namespace std::literals::string_view_literals; // Enables sv suffix only
// Point of use
constexpr std::array myStrings = { "one"sv, "two"sv, "three"sv };
you get something that involves no runtime work, has most of the benefits of std::string (knows its own length, can contain embedded NULs, accepted by most string-oriented APIs), and therefore operates more efficiently than a C-style string for the three common ways a function accepts string data:
For modern APIs that need to read a string-like thing, they accept std::string_view by value and the overhead is just copying the pointer and length to the function
For older APIs that accept const std::string&, it constructs a temporary std::string when you call it, but it can use the constructor that extracts the length from the std::string_view so it doesn't need to prewalk a C-style string with strlen to figure out how much to allocate.
For any API that needs a std::string (because it will modify/store its own copy), they're receiving string by value, and you get the same benefit as in #2 (it must be built, but it's built more efficiently).
The only case where you do worse by using std::string_views than using std::string is case #2 (where if the std::array contained std::strings, no copies would occur), and you only lose there if you make several such calls; in that scenario, you'd just bite the bullet and use const std::array myStrings = { "one"s, "two"s, "three"s };, paying the minor runtime cost to build real strings in exchange for avoiding copies when passing to old-style APIs taking const std::string&.

Related

C++ - std::initializer_list vs std::span

What is the difference between std::initializer_list and std::span? Both are contiguous sequences of values of some type. Both are non-owning.
So when do we use the first and when do we use the latter?

The short answer is that std::initializer_list<T> is used to create a new range, for the purposes of initialization. While std::span<T> is used to refer to existing ranges, for better APIs.
std::initializer_list<T> is a language feature that actually constructs a new array and owns it. It solves the problem of how to conveniently initialize containers:
template <typename T>
struct vector {
vector(std::initializer_list<T>);
};
vector<int> v = {1, 2, 3, 4};
That creates a std::initializer_list<int> on the fly containing the four integers there, and passes it into vector so that it can construct itself properly.
This is really the only place std::initializer_list<T> should be used: either constructors or function parameters to pass a range in on the fly, for convenience (unit tests are a common place that desires such convenience).
std::span<T> on the other hand is used to refer to an existing range. Its job is to replace functions of the form:
void do_something(int*, size_t);
with
void do_something(std::span<int>);
Which makes such functions generally easier to use and safer. std::span is constructible from any appropriate contiguous range, so taking our example from earlier:
std::vector<int> v = {1, 2, 3, 4};
do_something(v); // ok
It can also be used to replace functions of the form:
void do_something_else(std::vector<unsigned char> const&);
Which can only be called with, specifically, a vector, with the more general:
void do_something_else(std::span<unsigned char const>);
Which can be called with any backing contiguous storage over unsigned char.
With span you have to be careful, since it's basically a reference that just doesn't get spelled with a &, but it is an extremely useful type.

The principle differences between the two are how the language treats them. Or more specifically, how it doesn't in the case of a span.
You cannot create an initializer_list from an existing container object or array. They can only be created by copying from other initializer_lists or from a braced-init-list (ie: { stuff }). That is, the compiler governs the creation of an initializer_list and the array it points into.
Similarly, a constructor that takes only an initializer_list has special meaning to the compiler. When performing list initialization on that type, such constructors are given priority in overload resolution. span is given no special meaning by the compiler.
initializer_list also has lifetime rules that are different from span. The lifetime of the array pointed to by an initializer_list is the lifetime of the initializer_list object that was created from a braced-init-list. This is not true of span; the lifetime of a span created from a container is whatever you say it is based on how you use it in code.
Broadly speaking, you should only use initializer_list when you're initializing an object.

One difference is that std::span can be used dynamically, unlike std::initializer_list.

In C++ what is the point of std::array if the size has to be determined at compile time?

Pardon my ignorance, it appears to me that std::array is meant to be an STL replacement for your regular arrays. But because the array size has to be passed as a template parameter, it prevents us from creating std::array with a size known only at runtime.
std::array<char,3> nums {1,2,3}; // Works.
constexpr size_t size = 3;
std::array<char,size> nums {1,2,3}; // Works.
const buf_size = GetSize();
std::array<char, buf_size> nums; // Doesn't work.
I would assume that one very important use case for an array in C++ is to create a fixed size data structure based on runtime inputs (say allocating buffer for reading files).
The workarounds I use for this are:
// Create a array pointer for on-the-spot usecases like reading from a file.
char *data = new char[size];
...
delete[] data;
or:
// Use unique_ptr as a class member and I don't want to manage the memory myself.
std::unique_ptr<char[]> myarr_ = std::unique_ptr<char[]>(new char[size]);
If I don't care about fixed size, I am aware that I can use std::vector<char> with the size pre-defined as follows:
std::vector<char> my_buf (buf_size);
Why did the designers of std::array choose to ignore this use case? Perhaps I don't understand the real usecase for std::array.
EDIT: I guess another way to phrase my question could also be - Why did the designers choose to have the size passed as a template param and not as a constructor param? Would opting for the latter have made it difficult to provide the functionality that std::array currently has? To me it seems like a deliberate design choice and I don't understand why.

Ease of programming
std::array facilitates several beneficial interfaces and idioms which are used in std::vector. With normal C-style arrays, one cannot have .size() (no sizeof hack), .at() (exception for out of range), front()/back(), iterators, so on. Everything has to be hand-coded.
Many programmers may choose std::vector even for compile time known sized arrays, just because they want to utilize above programming methodologies. But that snatches away the performance available with compile time fixed size arrays.
Hence std::array was provided by the library makers to discourage the C-style arrays, and yet avoid std::vectors when the size is known at the compile time.

The two main reasons I understand are:
std::array implements STL's interfaces for collection-types, allowing an std::array to be passed as-is to functions and methods that accept any STL iterator.
To prevent array pointer decay... (below)
...this is the preservation of type information across function/method boundaries because it prevents Array Pointer Decay.
Given a naked C/C++ array, you can pass it to another function as a parameter argument by 4 ways:
void by_value1 ( const T* array )
void by_value2 ( const T array[] )
void by_pointer ( const T (*array)[U] )
void by_reference( const T (&array)[U] )
by_value1 and by_value2 are both semantically identical and cause pointer decay because the receiving function does not know the sizeof the array.
by_pointer and by_reference both requires that U by a known compile-time constant, but preserve sizeof information.
So if you avoid array decay by using by_pointer or by_reference you now have a maintenance problem every time you change the size of the array you have to manually update all of the call-sites that have that size in U.
By using std::array it's taken care of for you by making those functions template functions where U is a parameter (granted, you could still use the by_pointer and by_reference techniques but with messier syntax).
...so std::array adds a 5th way:
template<typename T, size_t N>
void by_stdarray( const std::array<T,N>& array )

std::array is a replacement for C-style arrays.
The C++ standards don't allow C-style arrays to be declared without compile-time defined sizes.

Should I compare a std::string to "string" or "string"s?

Consider this code snippet:
bool foo(const std::string& s) {
return s == "hello"; // comparing against a const char* literal
}
bool bar(const std::string& s) {
return s == "hello"s; // comparing against a std::string literal
}
At first sight, it looks like comparing against a const char* needs less assembly instructions1, as using a string literal will lead to an in-place construction of the std::string.
(EDIT: As pointed out in the answers, I forgot about the fact that effectively s.compare(const char*) will be called in foo(), so of course no in-place construction takes place in this case. Therefore striking out some lines below.)
However, looking at the operator==(const char*, const std::string&) reference:
All comparisons are done via the compare() member function.
From my understanding, this means that we will need to construct a std::string anyway in order to perform the comparison, so I suspect the overhead will be the same in the end (although hidden by the call to operator==).
Which of the comparisons should I prefer?
Does one version have advantages over the other (may be in specific situations)?
1 I'm aware that less assembly instructions doesn't neccessarily mean faster code, but I don't want to go into micro benchmarking here.

Neither.
If you want to be clever, compare to "string"sv, which returns a std::string_view.
While comparing against a literal like "string" does not result in any allocation-overhead, it's treated as a null terminated string, with all the concomittant disadvantages: No tolerance for embedded nulls, and users must heed the null terminator.
"string"s does an allocation, barring small-string-optimisation or allocation elision. Also, the operator gets passed the length of the literal, no need to count, and it allows for embedded nulls.
And finally using "string"sv combines the advantages of both other approaches, avoiding their individual disadvantages. Also, a std::string_view is a far simpler beast than a std::string, especially if the latter uses SSO as all modern ones do.
At least since C++14 (which generally allowed eliding allocations), compilers could in theory optimise all options to the last one, given sufficient information (generally available for the example) and effort, under the as-if rule. We aren't there yet though.

No, compare() does not require construction of a std::string for const char* operands.
You're using overload #4 here.
The comparison to string literal is the "free" version you're looking for. Instantiating a std::string here is completely unnecessary.

From my understanding, this means that we will need to construct a std::string anyway in order to perform the comparison, so I suspect the overhead will be the same in the end (although hidden by the call to operator==).
This is where that reasoning goes wrong. std::compare does not need to allocate its operand as a C-style null-terminated string to function. According to one of the overloads:
int compare( const CharT* s ) const; // (4)
4) Compares this string to the null-terminated character sequence beginning at the character pointed to by s with length Traits::length(s).
Although whether to allocate or not is an implementation detail, it does not seem reasonable that a sequence comparison would do so.

Check whether equal string literals are stored at the same address

I am developing a (C++) library that uses unordered containers. These require a hasher (usually a specialization of the template structure std::hash) for the types of the elements they store. In my case, those elements are classes that encapsulate string literals, similar to conststr of the example at the bottom of this page. The STL offers an specialization for constant char pointers, which, however, only computes pointers, as explained here, in the 'Notes' section:
There is no specialization for C strings. std::hash<const char*>
produces a hash of the value of the pointer (the memory address), it
does not examine the contents of any character array.
Although this is very fast (or so I think), it is not guaranteed by the C++ standard whether several equal string literals are stored at the same address, as explained in this question. If they aren't, the first condition of hashers wouldn't be met:
For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) ==
std::hash<Key>()(k2)
I would like to selectively compute the hash using the provided specialization, if the aforementioned guarantee is given, or some other algorithm otherwise. Although resorting back to asking those who include my headers or build my library to define a particular macro is feasible, an implementation defined one would be preferable.
Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?
An example:
#ifdef __GXX_SAME_STRING_LITERALS_SAME_ADDRESS__
const char str1[] = "abc";
const char str2[] = "abc";
assert( str1 == str2 );
#endif

Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?
gcc has the -fmerge-constants option (this is not a guarantee) :
Attempt to merge identical constants (string constants and floating-point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.
Enabled at levels -O, -O2, -O3, -Os.
Visual Studio has String Pooling (/GF option : "Eliminate Duplicate Strings")
String pooling allows what were intended as multiple pointers to multiple buffers to be multiple pointers to a single buffer. In the following code, s and t are initialized with the same string. String pooling causes them to point to the same memory:
char *s = "This is a character buffer";
char *t = "This is a character buffer";
Note: although MSDN uses char* strings literals, const char* should be used
clang apparently also has the -fmerge-constants option, but I can't find much about it, except in the --help section, so I'm not sure if it really is the equivalent of the gcc's one :
Disallow merging of constants
Anyway, how string literals are stored is implementation dependent (many do store them in the read-only portion of the program).
Rather than building your library on possible implementation-dependent hacks, I can only suggest the usage of std::string instead of C-style strings : they will behave exactly as you expect.
You can construct your std::string in-place in your containers with the emplace() methods :
std::unordered_set<std::string> my_set;
my_set.emplace("Hello");

Although C++ does not seem to allow for any way that works with string literals, there is an ugly but somewhat workable way around the problem if you don't mind rewriting your string literals as character sequences.
template <typename T, T...values>
struct static_array {
static constexpr T array[sizeof...(values)] { values... };
};
template <typename T, T...values>
constexpr T static_array<T, values...>::array[];
template <char...values>
using str = static_array<char, values..., '\0'>;
int main() {
return str<'a','b','c'>::array != str<'a','b','c'>::array;
}
This is required to return zero. The compiler has to ensure that even if multiple translation units instantiate str<'a','b','c'>, those definitions get merged, and you only end up with a single array.
You would need to make sure you don't mix this with string literals, though. Any string literal is guaranteed not to compare equal to any of the template instantiations' arrays.

The tacklelib C++11 library have a macro with the tmpl_string class to hold a literal string as a template class instance. The tmpl_string contains a static string with the same content which guarantees the same address for the same template class instance.
https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/include/tacklelib/tackle/tmpl_string.hpp
Tests:
https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/src/tests/unit/test_tmpl_string.cpp
Example:
const auto s = TACKLE_TMPL_STRING(0, "my literl string")
I've used it in another macro to conveniently and consistently extract a literal string begin/end:
#include <tacklelib/tackle/tmpl_string.hpp>
#include <tacklelib/utility/string_identity.hpp>
//...
std::vector<char> xml_arr;
xml_arr.insert(xml_arr.end(), UTILITY_LITERAL_STRING_WITH_BEGINEND_TUPLE("<?xml version='1.0' encoding='UTF-8'?>\n"));
https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/include/tacklelib/utility/string_identity.hpp

Simplest, safest way of holding a bunch of const char* in a set?

I want to hold a bunch of const char pointers into an std::set container [1]. std::set template requires a comparator functor, and the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them, but is there a cleaner, 'standard' way of doing it?
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
1: The pointers are to static strings, so there is no problem with their lifetimes - they won't go away.

If you don't want to wrap them in std::strings, you can define a functor class:
struct ConstCharStarComparator
{
bool operator()(const char *s1, const char *s2) const
{
return strcmp(s1, s2) < 0;
}
};
typedef std::set<const char *, ConstCharStarComparator> stringset_t;
stringset_t myStringSet;

Just go ahead and use the default ordering which is less<>. The Standard guarantees that less will work even for pointers to different objects:
"For templates greater, less, greater_equal, and less_equal, the specializations for any
pointer type yield a total order, even if the built-in operators <, >, <=, >= do not."
The guarantee is there exactly for things like your set<const char*>.

The "optimized way"
If we ignore the "premature optimization is the root of all evil", the standard way is to add a comparator, which is easy to write:
struct MyCharComparator
{
bool operator()(const char * A, const char * B) const
{
return (strcmp(A, B) < 0) ;
}
} ;
To use with a:
std::set<const char *, MyCharComparator>
The standard way
Use a:
std::set<std::string>
It will work even if you put a static const char * inside (because std::string, unlike const char *, is comparable by its contents).
Of course, if you need to extract the data, you'll have to extract the data through std::string.c_str(). In the other hand, , but as it is a set, I guess you only want to know if "AAA" is in the set, not extract the value "AAA" of "AAA".
Note: I did read about "Please do not suggest creating std::strings", but then, you asked the "standard" way...
The "never do it" way
I noted the following comment after my answer:
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
This smells of C (use of the deprecated "static" keyword, probable premature optimization used for std::string bashing, and string comparison through their addresses).
Anyway, you don't want to to compare your strings through their address. Because I guess the last thing you want is to have a set containing:
{ "AAA", "AAA", "AAA" }
Of course, if you only use the same global variables to contain the string, this is another story.
In this case, I suggest:
std::set<const char *>
Of course, it won't work if you compare strings with the same contents but different variables/addresses.
And, of course, it won't work with static const char * strings if those strings are defined in a header.
But this is another story.

Depending on how big a "bunch" is, I would be inclined to store a corresponding bunch of std::strings in the set. That way you won't have to write any extra glue code.

Must the set contain const char*?
What immediately springs to mind is storing the strings in a std::string instead, and putting those into the std::set. This will allow comparisons without a problem, and you can always get the raw const char* with a simple function call:
const char* data = theString.c_str();

Either use a comparator, or use a wrapper type to be contained in the set. (Note: std::string is a wrapper, too....)
const char* a("a");
const char* b("b");
struct CWrap {
const char* p;
bool operator<(const CWrap& other) const{
return strcmp( p, other.p ) < 0;
}
CWrap( const char* p ): p(p){}
};
std::set<CWrap> myset;
myset.insert(a);
myset.insert(b);

Others have already posted plenty of solutions showing how to do lexical comparisons with const char*, so I won't bother.
Please do not suggest creating std::strings - it is a waste of time and space.
If std::string is a waste of time and space, then std::set might be a waste of time and space as well. Each element in a std::set is allocated separately from the free store. Depending on how your program uses sets, this may hurt performance more than std::set's O(log n) lookups help performance. You may get better results using another data structure, such as a sorted std::vector, or a statically allocated array that is sorted at compile time, depending on the intended lifetime of the set.
the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
The strings are static, so they can be compared for (in)equality based on their address.
That depends on what the pointers point to. If all of the keys are allocated from the same array, then using operator< to compare pointers is not undefined behavior.
Example of an array containing separate static strings:
static const char keys[] = "apple\0banana\0cantaloupe";
If you create a std::set<const char*> and fill it with pointers that point into that array, their ordering will be well-defined.
If, however, the strings are all separate string literals, comparing their addresses will most likely involve undefined behavior. Whether or not it works depends on your compiler/linker implementation, how you use it, and your expectations.
If your compiler/linker supports string pooling and has it enabled, duplicate string literals should have the same address, but are they guaranteed to in all cases? Is it safe to rely on linker optimizations for correct functionality?
If you only use the string literals in one translation unit, the set ordering may be based on the order that the strings are first used, but if you change another translation unit to use one of the same string literals, the set ordering may change.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them
Casting the pointers to uintptr_t would seem to have no benefit over using pointer comparisons. The result is the same either way: implementation-specific.

Presumably you don't want to use std::string because of performance reasons.
I'm running MSVC and gcc, and they both seem to not mind this:
bool foo = "blah" < "grar";
EDIT: However, the behaviour in this case is unspecified. See comments...
They also don't complain about std::set<const char*>.
If you're using a compiler that does complain, I would probably go ahead with your suggested functor that casts the pointers to ints.
Edit:
Hey, I got voted down... Despite being one of the few people here that most directly answered his question. I'm new to Stack Overflow, is there any way to defend yourself if this happens? That being said, I'll try to right here:
The question is not looking for std::string solutions. Every time you enter an std::string in to the set, it will need to copy the entire string (until C++0x is standard, anyway). Also, every time you do a set look-up, it will need to do multiple string compares.
Storing the pointers in the set, however, incurs NO string copy (you're just copying the pointer around) and every comparison is a simple integer comparison on the addresses, not a string compare.
The question stated that storing the pointers to the strings was fine, I see no reason why we should all immediately assume that this statement was an error. If you know what you're doing, then there are considerable performance gains to using a const char* over either std::string or a custom comparison that calls strcmp. Yes, it's less safe, and more prone to error, but these are common trade-offs for performance, and since the question never stated the application, I think we should assume that he's already considered the pros and cons and decided in favor of performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ constexpr std::array of string literals - c++

Related

C++ - std::initializer_list vs std::span

In C++ what is the point of std::array if the size has to be determined at compile time?

Should I compare a std::string to "string" or "string"s?

Check whether equal string literals are stored at the same address

Simplest, safest way of holding a bunch of const char* in a set?

Categories

Resources