Indentation aware raw string literals - c++

Is there a way to have raw string literals that is aware of the indentation?
e.g.
{
std::string_view str(
R"(
Hello
World
)");
std::cout << "ref\n" << str;
}
prints
ref
Hello
World
but I would like
ref
Hello
World
I see this answer solves this, but it is run time.
With c23 I think #embed might solve this.
But Is there a way to so this at compile time, preferably with c++17, c++20 is okay too.

tl;dr
It is possible to do this in C++20 with a user-defined literal operator.
Example: godbolt
std::cout << R"(
Hello
World
)"_M << std::endl;
It would theoretically also be possible to do this in C++17, but there the syntax would not be as nice:
constexpr auto str = unindent(R"(
Hello
World!
)"); // type of str would be std::array<char, N>
std::cout << str.data() << std::endl;
1. Why a C++17 implementation would be suboptimal
The biggest problem with C++17 in this case is that we need to modify the string (remove spaces within it) - so we need to store the modified string somewhere.
In C++20 we can get the string as a template argument, utilizing a string literal operator template, e.g.:
template<class _char_type, std::size_t size>
struct string_wrapper {
using char_type = _char_type;
consteval string_wrapper(const char_type (&arr)[size]) {
std::ranges::copy(arr, str);
}
char_type str[size];
};
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
return do_unindent<str>();
}
// R"(foo)"_M
// would be interpreted as:
// operator"" _M<string_wrapper{R"(foo)"}>()
This allows us to store the unindented-string within a template parameter, so its lifetime will never be a problem; we can directly return a reference to the array stored within the template parameter.
In C++17 on the other hand we can't have class-types as template arguments and the string literal operator template is also not available. So there's no neat way to store the modified string globally (static variables are not allowed within constant expressions, so that's not an option)
So the only way to implement this in C++17 would be to return a std::array (or something similar):
template<std::size_t size>
constexpr auto unindent(const char (&str)[size]) {
return std::array<char, size>{ /* ... */ };
}
That however has a few major downsides:
You always have to call .data() to convert the std::array back into a const char*:
std::cout << unindent(R"(foo)").data() << std::endl;
You have to be careful to keep the std::array alive:
// BAD: Woops: dangling pointer!
const char* str = unindent(R"(foo)").data();
// GOOD: keep array alive
auto arr = unindent(R"(foo)");
const char* str = arr.data();
Note that the array we return would need to be bigger than the actual unindented string (std::array<char, size>). Due to str being a parameter to the function it is not a constant expression, so we can't use it to determine the actual size of the unindented string (so we only have the length of the original string to work with):
constexpr std::array<char, 8> arr = unindent(" a\n b");
// arr would be {'a', '\n', 'b', '\0', '\0', '\0', '\0', '\0'};
So this would most likely result in a lot of additional '\0''s in your final binary, in addition to being rather confusing (arr.size() is not the true length of the string)
We can't use user-defined literals - the only variant that is supported in C++17 would be:
constexpr /*...*/ operator"" _M(const char* str, std::size_t len) {
return std::array<char, 1000> { /* ... */ };
}
... but with that both the str and it's length are no longer constant expressions, so we can use neither for the size of our std::array; so the only possible option would be to used a fixed-size array - which is not very useful.
For those reasons i'll only provide a C++20 implementation.
2. Full Implementation
This is the full C++20 implementation:
godbolt
#include <algorithm>
#include <string_view>
#include <vector>
#include <ranges>
namespace multiline_raw_string {
template<class char_type>
using string_view = std::basic_string_view<char_type>;
// characters that are considered space
// we need this because std::isspace is not constexpr
template<class char_type>
constexpr string_view<char_type> space_chars = std::declval<string_view<char_type>>();
template<>
constexpr string_view<char> space_chars<char> = " \f\n\r\t\v";
template<>
constexpr string_view<wchar_t> space_chars<wchar_t> = L" \f\n\r\t\v";
template<>
constexpr string_view<char8_t> space_chars<char8_t> = u8" \f\n\r\t\v";
template<>
constexpr string_view<char16_t> space_chars<char16_t> = u" \f\n\r\t\v";
template<>
constexpr string_view<char32_t> space_chars<char32_t> = U" \f\n\r\t\v";
// list of all potential line endings that could be encountered
template<class char_type>
constexpr string_view<char_type> potential_line_endings[] = std::declval<string_view<char_type>[]>();
template<>
constexpr string_view<char> potential_line_endings<char>[] = {
"\r\n",
"\r",
"\n"
};
template<>
constexpr string_view<wchar_t> potential_line_endings<wchar_t>[] = {
L"\r\n",
L"\r",
L"\n"
};
template<>
constexpr string_view<char8_t> potential_line_endings<char8_t>[] = {
u8"\r\n",
u8"\r",
u8"\n"
};
template<>
constexpr string_view<char16_t> potential_line_endings<char16_t>[] = {
u"\r\n",
u"\r",
u"\n"
};
template<>
constexpr string_view<char32_t> potential_line_endings<char32_t>[] = {
U"\r\n",
U"\r",
U"\n"
};
// null-terminator for the different character types
template<class char_type>
constexpr char_type null_char = std::declval<char_type>();
template<>
constexpr char null_char<char> = '\0';
template<>
constexpr wchar_t null_char<wchar_t> = L'\0';
template<>
constexpr char8_t null_char<char8_t> = u8'\0';
template<>
constexpr char16_t null_char<char16_t> = u'\0';
template<>
constexpr char32_t null_char<char32_t> = U'\0';
// detects the line ending used within a string.
// e.g. detect_line_ending("foo\nbar\nbaz") -> "\n"
template<class char_type>
consteval string_view<char_type> detect_line_ending(string_view<char_type> str) {
return *std::ranges::max_element(
potential_line_endings<char_type>,
{},
[str](string_view<char_type> line_ending) {
// count the number of lines we would get with line_ending
auto view = std::views::split(str, line_ending);
return std::ranges::distance(view);
}
);
}
// returns a view to the leading sequence of space characters within a string
// e.g. get_leading_space_sequence(" \t foo") -> " \t "
template<class char_type>
consteval string_view<char_type> get_leading_space_sequence(string_view<char_type> line) {
return line.substr(0, line.find_first_not_of(space_chars<char_type>));
}
// checks if a line consists purely out of space characters
// e.g. is_line_empty(" \t") -> true
// is_line_empty(" foo") -> false
template<class char_type>
consteval bool is_line_empty(string_view<char_type> line) {
return get_leading_space_sequence(line).size() == line.size();
}
// splits a string into individual lines
// and removes the first & last line if they are empty
// e.g. split_lines("\na\nb\nc\n", "\n") -> {"a", "b", "c"}
template<class char_type>
consteval std::vector<string_view<char_type>> split_lines(
string_view<char_type> str,
string_view<char_type> line_ending
) {
std::vector<string_view<char_type>> lines;
for (auto line : std::views::split(str, line_ending)) {
lines.emplace_back(line.begin(), line.end());
}
// remove first/last lines in case they are completely empty
if(lines.size() > 1 && is_line_empty(lines[0])) {
lines.erase(lines.begin());
}
if(lines.size() > 1 && is_line_empty(lines[lines.size()-1])) {
lines.erase(lines.end()-1);
}
return lines;
}
// determines the longest possible sequence of space characters
// that we can remove from each line.
// e.g. determine_common_space_prefix_sequence({" \ta", " foo", " \t\ŧbar"}) -> " "
template<class char_type>
consteval string_view<char_type> determine_common_space_prefix_sequence(
std::vector<string_view<char_type>> const& lines
) {
std::vector<string_view<char_type>> space_sequences = {
string_view<char_type>{} // empty string
};
for(string_view<char_type> line : lines) {
string_view<char_type> spaces = get_leading_space_sequence(line);
for(std::size_t len = 1; len <= spaces.size(); len++) {
space_sequences.emplace_back(spaces.substr(0, len));
}
// remove duplicates
std::ranges::sort(space_sequences);
auto [first, last] = std::ranges::unique(space_sequences);
space_sequences.erase(first, last);
}
// only consider space prefix sequences that apply to all lines
auto shared_prefixes = std::views::filter(
space_sequences,
[&lines](string_view<char_type> prefix) {
return std::ranges::all_of(
lines,
[&prefix](string_view<char_type> line) {
return line.starts_with(prefix);
}
);
}
);
// select the longest possible space prefix sequence
return *std::ranges::max_element(
shared_prefixes,
{},
&string_view<char_type>::size
);
}
// unindents the individual lines of a raw string literal
// e.g. unindent_string(" \n a\n b\n c\n") -> "a\nb\nc"
template<class char_type>
consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
string_view<char_type> line_ending = detect_line_ending(str);
std::vector<string_view<char_type>> lines = split_lines(str, line_ending);
string_view<char_type> common_space_sequence = determine_common_space_prefix_sequence(lines);
std::vector<char_type> new_string;
bool is_first = true;
for(auto line : lines) {
// append newline
if(is_first) {
is_first = false;
} else {
new_string.insert(new_string.end(), line_ending.begin(), line_ending.end());
}
// append unindented line
auto unindented = line.substr(common_space_sequence.size());
new_string.insert(new_string.end(), unindented.begin(), unindented.end());
}
// add null terminator
new_string.push_back(null_char<char_type>);
return new_string;
}
// returns the size required for the unindented string
template<class char_type>
consteval std::size_t unindent_string_size(string_view<char_type> str) {
return unindent_string(str).size();
}
// simple type that stores a raw string
// we need this to get around the limitation that string literals
// are not considered valid non-type template arguments.
template<class _char_type, std::size_t size>
struct string_wrapper {
using char_type = _char_type;
consteval string_wrapper(const char_type (&arr)[size]) {
std::ranges::copy(arr, str);
}
char_type str[size];
};
// used for sneakily creating and storing
// the unindented string in a template parameter.
template<string_wrapper sw>
struct unindented_string_wrapper {
using char_type = typename decltype(sw)::char_type;
static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
using array_ref = const char_type (&)[buffer_size];
consteval unindented_string_wrapper(int) {
auto newstr = unindent_string<char_type>(sw.str);
std::ranges::copy(newstr, buffer);
}
consteval array_ref get() const {
return buffer;
}
char_type buffer[buffer_size];
};
// uses a defaulted template argument that depends on the str
// to initialize the unindented string within a template parameter.
// this enables us to return a reference to the unindented string.
template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
consteval decltype(auto) do_unindent() {
return unindented.get();
}
// the actual user-defined string literal operator
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
return do_unindent<str>();
}
}
using multiline_raw_string::operator"" _M;
Usage Example: godbolt
std::cout << R"(
a
b
c
d
)"_M << std::endl;
/* Will print the following:
a
b
c
d
*/
// The type of R"(...)"_M is still const char (&)[N],
// so it can be used like a normal string literal:
std::cout << std::size(R"(asdf)"_M) << std::endl;
// (will print 5)
// Lifetime is not a problem; can be stored in a std::string_view:
constexpr std::string_view str = R"(
foo
bar
)"_M;
// also works with wchar_t, char8_t, char16_t and char32_t literals:
std::wcout << LR"(foo)"_M << std::endl;
3. Implementation Notes
What characters are considered spaces?
std::isspace() is unfortunately not constexpr, so we have to roll our own: (you can add additional characters that you want to be considered as "space" for the purpose of unindenting):
template<>
constexpr string_view<char> space_chars<char> = " \f\n\r\t\v";
Line Endings.
The line endings within the raw string literal depend on the line-endings used within the source code; there's unfortunately no nice way to get the compiler to tell us what line-endings are in use (there might even be mixed ones within a single source file) - so we have to detect manually what type of line endings are present in the string literal.
I've implemented \r\n, \n and \r as potential line endings, those should cover most use-cases:
template<>
constexpr string_view<char> potential_line_endings<char>[] = {
"\r\n",
"\r",
"\n"
};
Dealing with mixed indentation
The space-characters used for indentation must be identical on each line, otherwise they won't be removed. - so you use whatever space character you like for indentation (and even mix them) - as long as you're consistent across all lines.
e.g. this would not work:
// mixed indentation (indentation will remain)
constexpr std::string_view str = R"(
<tab>foo
bar
)"_M;
but this will:
// all lines have the same indentation pattern
constexpr std::string_view str = R"(
<tab> foo
<tab> bar
)"_M;

There's always implicit string literal concatenation. You still need the \n characters, but the relative positions of lines are clear.
{
std::string_view str(
"Hello\n"
" World\n"
);
std::cout << "ref\n" << str;
}

Related

Transforming a string_view in-place

std::transform, as of C++20, is declared constexpr. I have a bunch of string utility functions that take std::string arguments, but a lot of the usage ends up just passing in small, short, character literal sequences at compile-time. I thought I would leverage this fact and declare versions that are constexpr and take std::string_views instead of creating temporary std::string variables just to throw them away...
ORIGINAL STD::STRING VERSION:
[[nodiscard]] std::string ToUpperCase(std::string string) noexcept {
std::transform(string.begin(), string.end(), string.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c, std::locale("")); });
return string;
}
NEW STD::STRING_VIEW VERSION:
[[nodiscard]] constexpr std::string_view ToUpperCase(std::string_view stringview) noexcept {
std::transform(stringview.begin(), stringview.end(), stringview.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c, std::locale("")); });
return stringview;
}
But MSVC complains:
error C3892: '_UDest': you cannot assign to a variable that is const
Is there a way to call std::transform with a std::string_view and put it back into the std::string_view or am I going to have to create a local string and return that, thereby defeating the purpose of using std::string_view in the first place?
[[nodiscard]] constexpr std::string ToUpperCase(std::string_view stringview) noexcept {
std::string copy{stringview};
std::transform(stringview.begin(), stringview.end(), copy.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c, std::locale("")); });
return copy;
}
You can't in-place transform a std::string_view - what if it has been constructed from char const*?
a lot of the usage ends up just passing in small, short, character literal sequences at compile-time.
...but you can lift string literals to the type level
namespace impl {
template<std::size_t n> struct Str {
std::array<char, n> raw{};
constexpr Str(char const (&src)[n + 1]) { std::copy_n(src, n, raw.begin()); }
};
template<std::size_t n> Str(char const (&)[n]) -> Str<n - 1>;
}
template<char... cs> struct Str { static char constexpr value[]{cs..., '\0'}; };
template<impl::Str s>
auto constexpr str_v = []<std::size_t... is>(std::index_sequence<is...>) {
return Str<s.raw[is]...>{};
}(std::make_index_sequence<s.raw.size()>{});
...and add a special case. In general, this hack can be avoided with range/tuple polymorphic algorithms.
[[nodiscard]] constexpr auto ToUpperCase(auto str) {
for (auto&& c: str) c = ConstexprToUpper(c); // std::toupper doesn't seem constexpr
return str;
}
template<char... cs> [[nodiscard]] constexpr auto ToUpperCase(Str<cs...>) {
return Str<ConstexprToUpper(cs)...>{};
}
So, to use that transformation optimized for character literal sequences, now write ToUpperCase(str_v<"abc">) instead of ToUpperCase("abc"sv). If you always want string_view as output, return std::string_view{Str<ConstexprToUpper(cs)...>::value} in that overload.
As said in one comment, span is a better vocabulary type for this because individual elements can be modified.
Also I wouldn't make it nodiscard, because it can be useful even without assigning the result:
#include<algorithm>
#include<cassert>
#include<cctype>
#include<locale>
#include<string_view>
#include<span>
constexpr std::span<char> ToUpperCase(std::span<char> stringview) noexcept {
std::transform(stringview.begin(), stringview.end(), stringview.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c); });
return stringview;
}
int main() {
std::string a = "compiler";
std::string b = ToUpperCase(a);
assert( a == "COMPILER");
assert( b == "COMPILER");
}
https://godbolt.org/z/Toz8Y9bj9
Somewhat departing from your original aim...
I think this is more elegant, although subject to bloating and ugly compilation errors.
It has the same effect in the cases provided.
Also I don't like the design of span (or string_view for that matter)
(Exercise: add Concepts)
template<class StringRange>
constexpr StringRange&& ToUpperCase(StringRange&& stringview) noexcept {
std::transform(stringview.begin(), stringview.end(), stringview.begin(), [](unsigned char c) -> unsigned char { return std::toupper(c); });
return std::forward<StringRange>(stringview);
}
https://godbolt.org/z/e9aWKMerE
I find myself using this idiom quite a bit lately.

C++ How can I improve this bit of template meta-program to give back the array including the size?

I've got a utility called choose_literal which chooses a literal string encoded as char*, wchar_*, char8_t*, char16_t*, char32_t* depending on the desired type (the choice).
It looks like this:
template <typename T>
constexpr auto choose_literal(const char * psz, const wchar_t * wsz, const CHAR8_T * u8z, const char16_t * u16z, const char32_t * u32z) {
if constexpr (std::is_same_v<T, char>)
return psz;
if constexpr (std::is_same_v<T, wchar_t>)
return wsz;
#ifdef char8_t
if constexpr (std::is_same_v<T, char8_t>)
return u8z;
#endif
if constexpr (std::is_same_v<T, char16_t>)
return u16z;
if constexpr (std::is_same_v<T, char32_t>)
return u32z;
}
I supply a little preprocessor macro to make this work w/o having to type each of those string encodings manually:
// generates the appropriate character literal using preprocessor voodoo
// usage: LITERAL(char-type, "literal text")
#define LITERAL(T,x) details::choose_literal<T>(x, L##x, u8##x, u##x, U##x)
This of course only works for literal strings which can be encoded in the target format by the compiler - but something like an empty string can be, as can ASCII characters (i.e. a-z, 0-9, etc., which have representations in all of those encodings).
e.g. here's a trivial bit of code that will return the correct empty-string given a valid character type 'T':
template <typename T>
constexpr const T * GetBlank() {
return LITERAL(T, "");
}
This is great as far as it goes, and it works well enough in my code.
What I'd like to do is to refactor this such that I get back the character-array including its size, as if I'd written something like:
const char blank[] = "";
or
const wchar_t blank[] = L"";
Which allows the compiler to know the length of the string-literal, not just its address.
My choose_literal<T>(str) returns only the const T * rather than the const T (&)[size] which would be ideal.
In general I'd love to be able to pass such entities around intact - rather than have them devolve into just a pointer.
But in this specific case, is there a technique you might point me towards that allows me to declare a struct with a data-member for the desired encoding which then also knows its array-length?
A little bit of constexpr recursion magic allows you to return a string_view of the appropriate type.
#include <string_view>
#include <type_traits>
#include <iostream>
template <typename T, class Choice, std::size_t N, class...Rest>
constexpr auto choose_literal(Choice(& choice)[N], Rest&...rest)
{
using const_char_type = Choice;
using char_type = std::remove_const_t<const_char_type>;
if constexpr (std::is_same_v<T, char_type>)
{
constexpr auto extent = N;
return std::basic_string_view<char_type>(choice, extent - 1);
}
else
{
return choose_literal<T>(rest...);
}
}
int main()
{
auto clit = choose_literal<char>("hello", L"hello");
std::cout << clit;
auto wclit = choose_literal<wchar_t>("hello", L"hello");
std::wcout << wclit;
}
https://godbolt.org/z/4roZ_O
If it were me, I'd probably want to wrap this and other functions into a constexpr class which offers common services like printing the literal in the correct form depending on the stream type, and creating the correct kind of string from the literal.
For example:
#include <string_view>
#include <type_traits>
#include <iostream>
#include <tuple>
template <typename T, class Choice, std::size_t N, class...Rest>
constexpr auto choose_literal(Choice(& choice)[N], Rest&...rest)
{
using const_char_type = Choice;
using char_type = std::remove_const_t<const_char_type>;
if constexpr (std::is_same_v<T, char_type>)
{
constexpr auto extent = N;
return std::basic_string_view<char_type>(choice, extent - 1);
}
else
{
return choose_literal<T>(rest...);
}
}
template<class...Choices>
struct literal_chooser
{
constexpr literal_chooser(Choices&...choices)
: choices_(choices...)
{}
template<class T>
constexpr auto choose()
{
auto invoker = [](auto&...choices)
{
return choose_literal<T>(choices...);
};
return std::apply(invoker, choices_);
}
std::tuple<Choices&...> choices_;
};
template<class Char, class...Choices>
std::basic_ostream<Char>& operator<<(std::basic_ostream<Char>& os, literal_chooser<Choices...> chooser)
{
return os << chooser.template choose<Char>();
}
template<class Char, class...Choices>
std::basic_string<Char> to_string(literal_chooser<Choices...> chooser)
{
auto sview = chooser.template choose<Char>();
return std::basic_string<Char>(sview.data(), sview.size());
}
int main()
{
auto lit = literal_chooser("hello", L"hello");
std::cout << lit << std::endl;
std::wcout << lit << std::endl;
auto s1 = to_string<char>(lit);
auto s2 = to_string<wchar_t>(lit);
std::cout << s1 << std::endl;
std::wcout << s2 << std::endl;
}
The use of the reference argument type Choices& is important. C++ string literals are references to arrays of const Char. Passing by value would result in the literal being decayed into a pointer, which would lose information about the extent of the array.
we can add other services, written in terms of the literal_chooser:
template<class Char, class...Choices>
constexpr std::size_t size(literal_chooser<Choices...> chooser)
{
auto sview = chooser.template choose<Char>();
return sview.size();
}
We're going to change the function so that it takes a const T (&)[size] for each input, and the return type is going to be decltype(auto). Using decltype(auto) prevents the return from decaying into a value, preserving things like references to arrays.
Updated function:
template <typename T, size_t N1, size_t N2, size_t N3, size_t N4>
constexpr decltype(auto) choose_literal(const char (&psz)[N1], const wchar_t (&wsz)[N2], const char16_t (&u16z)[N3], const char32_t (&u32z)[N4]) {
if constexpr (std::is_same<T, char>())
return psz;
if constexpr (std::is_same<T, wchar_t>())
return wsz;
if constexpr (std::is_same<T, char16_t>())
return u16z;
if constexpr (std::is_same<T, char32_t>())
return u32z;
}
In main, we can assign the result to something of type auto&&:
#define LITERAL(T,x) choose_literal<T>(x, L##x, u##x, U##x)
int main() {
constexpr auto&& literal = LITERAL(char, "hello");
return sizeof(literal); // Returns 6
}
Potential simplification
We can simplify the choose_literal function by making it recursive, that way it can be expanded for any number of types. This works without any changes to the LITERAL macro.
template<class T, class Char, size_t N, class... Rest>
constexpr decltype(auto) choose_literal(const Char(&result)[N], Rest const&... rest) {
if constexpr(std::is_same_v<T, Char>)
return result;
else
return choose_literal<T>(rest...);
}

How to define string literal with character type that depends on template parameter?

template<typename CharType>
class StringTraits {
public:
static const CharType NULL_CHAR = '\0';
static constexpr CharType* WHITESPACE_STR = " ";
};
typedef StringTraits<char> AStringTraits;
typedef StringTraits<wchar_t> WStringTraits;
I know I could do it with template specialization, but this would require some duplication (by defining string literals with and without L prefix).
Is there a simpler way to define const/constexpr char/wchar_t and char*/wchar_t* with same string literal in a template class?
There are several ways to do this, depending on the available version of the C++ standard.
If you have C++17 available, you can scroll down to Method 3, which is the most elegant solution in my opinion.
Note: Methods 1 and 3 assume that the characters of the string literal will be restricted to 7-bit ASCII. This requires that characters are in the range [0..127] and the execution character set is compatible with 7-bit ASCII (e. g. Windows-1252 or UTF-8). Otherwise the simple casting of char values to wchar_t used by these methods won't give the correct result.
Method 1 - aggregate initialization (C++03)
The simplest way is to define an array using aggregate initialization:
template<typename CharType>
class StringTraits {
public:
static const CharType NULL_CHAR = '\0';
static constexpr CharType WHITESPACE_STR[] = {'a','b','c',0};
};
Method 2 - template specialization and macro (C++03)
(Another variant is shown in this answer.)
The aggregate initialization method can be cumbersome for long strings. For more comfort, we can use a combination of template specialization and macros:
template< typename CharT > constexpr CharT const* NarrowOrWide( char const*, wchar_t const* );
template<> constexpr char const* NarrowOrWide< char >( char const* c, wchar_t const* )
{ return c; }
template<> constexpr wchar_t const* NarrowOrWide< wchar_t >( char const*, wchar_t const* w )
{ return w; }
#define TOWSTRING1(x) L##x
#define TOWSTRING(x) TOWSTRING1(x)
#define NARROW_OR_WIDE( C, STR ) NarrowOrWide< C >( ( STR ), TOWSTRING( STR ) )
Usage:
template<typename CharType>
class StringTraits {
public:
static constexpr CharType const* WHITESPACE_STR = NARROW_OR_WIDE( CharType, " " );
};
Live Demo at Coliru
Explanation:
The template function NarrowOrWide() returns either the first (char const*) or the second (wchar_t const*) argument, depending on template parameter CharT.
The macro NARROW_OR_WIDE is used to avoid having to write both the narrow and the wide string literal. The macro TOWSTRING simply prepends the L prefix to the given string literal.
Of course the macro will only work if the range of characters is limited to basic ASCII, but this is usually sufficient. Otherwise one can use the NarrowOrWide() template function to define narrow and wide string literals separately.
Notes:
I would add a "unique" prefix to the macro names, something like the name of your library, to avoid conflicts with similar macros defined elsewhere.
Method 3 - array initialized via template parameter pack (C++17)
C++17 finally allows us to get rid of the macro and use a pure C++ solution. The solution uses template parameter pack expansion to initialize an array from a string literal while static_casting the individual characters to the desired type.
First we declare a str_array class, which is similar to std::array but tailored for constant null-terminated string (e. g. str_array::size() returns number of characters without '\0', instead of buffer size). This wrapper class is necessary, because a plain array cannot be returned from a function. It must be wrapped in a struct or class.
template< typename CharT, std::size_t Length >
struct str_array
{
constexpr CharT const* c_str() const { return data_; }
constexpr CharT const* data() const { return data_; }
constexpr CharT operator[]( std::size_t i ) const { return data_[ i ]; }
constexpr CharT const* begin() const { return data_; }
constexpr CharT const* end() const { return data_ + Length; }
constexpr std::size_t size() const { return Length; }
// TODO: add more members of std::basic_string
CharT data_[ Length + 1 ]; // +1 for null-terminator
};
So far, nothing special. The real trickery is done by the following str_array_cast() function, which initializes the str_array from a string literal while static_casting the individual characters to the desired type:
#include <utility>
namespace detail {
template< typename ResT, typename SrcT >
constexpr ResT static_cast_ascii( SrcT x )
{
if( !( x >= 0 && x <= 127 ) )
throw std::out_of_range( "Character value must be in basic ASCII range (0..127)" );
return static_cast<ResT>( x );
}
template< typename ResElemT, typename SrcElemT, std::size_t N, std::size_t... I >
constexpr str_array< ResElemT, N - 1 > do_str_array_cast( const SrcElemT(&a)[N], std::index_sequence<I...> )
{
return { static_cast_ascii<ResElemT>( a[I] )..., 0 };
}
} //namespace detail
template< typename ResElemT, typename SrcElemT, std::size_t N, typename Indices = std::make_index_sequence< N - 1 > >
constexpr str_array< ResElemT, N - 1 > str_array_cast( const SrcElemT(&a)[N] )
{
return detail::do_str_array_cast< ResElemT >( a, Indices{} );
}
The template parameter pack expansion trickery is required, because constant arrays can only be initialized via aggregate initialization (e. g. const str_array<char,3> = {'a','b','c',0};), so we have to "convert" the string literal to such an initializer list.
The code triggers a compile time error if any character is outside of basic ASCII range (0..127), for the reasons given at the beginning of this answer. There are code pages where 0..127 doesn't map to ASCII, so this check does not give 100% safety though.
Usage:
template< typename CharT >
struct StringTraits
{
static constexpr auto WHITESPACE_STR = str_array_cast<CharT>( "abc" );
// Fails to compile (as intended), because characters are not basic ASCII.
//static constexpr auto WHITESPACE_STR1 = str_array_cast<CharT>( "äöü" );
};
Live Demo at Coliru
Here is a refinement of the now-common template-based solution which
preserves the array[len] C++ type of the C strings rather than decaying them to pointers, which means you can call sizeof() on the result and get the size of the string+NUL, not the size of a pointer, just as if you had the original string there.
Works even if the strings in different encodings have different length in code units (which is virtually guaranteed if the strings have non-ASCII text).
Does not incur any runtime overhead nor does it attempt/need to do encoding conversion at runtime.
Credit: This refinement starts with the original template idea from Mark Ransom and #2 from zett42 and borrows some ideas from, but fixes the size limitations of, Chris Kushnir's answer.
This code does char and wchar_t but it is trivial to extend it to char8_t+char16_t+char32_t
// generic utility for C++ pre-processor concatenation
// - avoids a pre-processor issue if x and y have macros inside
#define _CPP_CONCAT(x, y) x ## y
#define CPP_CONCAT(x, y) _CPP_CONCAT(x, y)
// now onto stringlit()
template<size_t SZ0, size_t SZ1>
constexpr
auto _stringlit(char c,
const char (&s0) [SZ0],
const wchar_t (&s1) [SZ1]) -> const char(&)[SZ0]
{
return s0;
}
template<size_t SZ0, size_t SZ1>
constexpr
auto _stringlit(wchar_t c,
const char (&s0) [SZ0],
const wchar_t (&s1) [SZ1]) -> const wchar_t(&)[SZ1]
{
return s1;
}
#define stringlit(code_unit, lit) \
_stringlit(code_unit (), lit, CPP_CONCAT(L, lit))
Here we are not using C++ overloading but rather defining one function per char encoding, each function with different signatures. Each function returns the original array type with the original bounds. The selector that chooses the appropriate function is a single character in the desired encoding (value of that character not important). We cannot use the type itself in a template parameter to select because then we'd be overloading and have conflicting return types. This code also works without the constexpr. Note we are returning a reference to an array (which is possible in C++) not an array (which is not allowed in C++). The use of trailing return type syntax here is optional, but a heck of a lot more readable than the alternative, something like const char (&stringlit(...params here...))[SZ0] ugh.
I compiled this with clang 9.0.8 and MSVC++ from Visual Studio 2019 16.7 (aka _MSC_VER 1927 aka pdb ver 14.27). I had c++2a/c++latest enabled, but I think C++14 or 17 is sufficient for this code.
Enjoy!
Here's an alternative implementation based on #zett42 's answer. Please advise me.
#include <iostream>
#include <tuple>
#define TOWSTRING_(x) L##x
#define TOWSTRING(x) TOWSTRING_(x)
#define MAKE_LPCTSTR(C, STR) (std::get<const C*>(std::tuple<const char*, const wchar_t*>(STR, TOWSTRING(STR))))
template<typename CharType>
class StringTraits {
public:
static constexpr const CharType* WHITESPACE_STR = MAKE_LPCTSTR(CharType, "abc");
};
typedef StringTraits<char> AStringTraits;
typedef StringTraits<wchar_t> WStringTraits;
int main(int argc, char** argv) {
std::cout << "Narrow string literal: " << AStringTraits::WHITESPACE_STR << std::endl;
std::wcout << "Wide string literal : " << WStringTraits::WHITESPACE_STR << std::endl;
return 0;
}
I've just came up with a compact answer, which is similar to other C++17 versions. Similarly, it relies on implementation defined behavior, specifically on the environment character types. It supports converting ASCII and ISO-8859-1 to UTF-16 wchar_t, UTF-32 wchar_t, UTF-16 char16_t and UTF-32 char32_t. UTF-8 input is not supported, but more elaborate conversion code is feasible.
template <typename Ch, size_t S>
constexpr auto any_string(const char (&literal)[S]) -> const array<Ch, S> {
array<Ch, S> r = {};
for (size_t i = 0; i < S; i++)
r[i] = literal[i];
return r;
}
Full example follows:
$ cat any_string.cpp
#include <array>
#include <fstream>
using namespace std;
template <typename Ch, size_t S>
constexpr auto any_string(const char (&literal)[S]) -> const array<Ch, S> {
array<Ch, S> r = {};
for (size_t i = 0; i < S; i++)
r[i] = literal[i];
return r;
}
int main(void)
{
auto s = any_string<char>("Hello");
auto ws = any_string<wchar_t>(", ");
auto s16 = any_string<char16_t>("World");
auto s32 = any_string<char32_t>("!\n");
ofstream f("s.txt");
f << s.data();
f.close();
wofstream wf("ws.txt");
wf << ws.data();
wf.close();
basic_ofstream<char16_t> f16("s16.txt");
f16 << s16.data();
f16.close();
basic_ofstream<char32_t> f32("s32.txt");
f32 << s32.data();
f32.close();
return 0;
}
$ c++ -o any_string any_string.cpp -std=c++17
$ ./any_string
$ cat s.txt ws.txt s16.txt s32.txt
Hello, World!
A variation of zett42 Method 2 above.
Has the advantage of supporting all char types (for literals that can be represented as char[]) and preserving the proper string literal array type.
First the template functions:
template<typename CHAR_T>
constexpr
auto LiteralChar(
char A,
wchar_t W,
char8_t U8,
char16_t U16,
char32_t U32
) -> CHAR_T
{
if constexpr( std::is_same_v<CHAR_T, char> ) return A;
else if constexpr( std::is_same_v<CHAR_T, wchar_t> ) return W;
else if constexpr( std::is_same_v<CHAR_T, char8_t> ) return U8;
else if constexpr( std::is_same_v<CHAR_T, char16_t> ) return U16;
else if constexpr( std::is_same_v<CHAR_T, char32_t> ) return U32;
}
template<typename CHAR_T, size_t SIZE>
constexpr
auto LiteralStr(
const char (&A) [SIZE],
const wchar_t (&W) [SIZE],
const char8_t (&U8) [SIZE],
const char16_t (&U16)[SIZE],
const char32_t (&U32)[SIZE]
) -> const CHAR_T(&)[SIZE]
{
if constexpr( std::is_same_v<CHAR_T, char> ) return A;
else if constexpr( std::is_same_v<CHAR_T, wchar_t> ) return W;
else if constexpr( std::is_same_v<CHAR_T, char8_t> ) return U8;
else if constexpr( std::is_same_v<CHAR_T, char16_t> ) return U16;
else if constexpr( std::is_same_v<CHAR_T, char32_t> ) return U32;
}
Then the macros:
#define CMK_LC(CHAR_T, LITERAL) \
LiteralChar<CHAR_T>( LITERAL, L ## LITERAL, u8 ## LITERAL, u ## LITERAL, U ## LITERAL )
#define CMK_LS(CHAR_T, LITERAL) \
LiteralStr<CHAR_T>( LITERAL, L ## LITERAL, u8 ## LITERAL, u ## LITERAL, U ## LITERAL )
Then use:
template<typename CHAR_T>
class StringTraits {
public:
struct LC { // literal character
static constexpr CHAR_T Null = CMK_LC(CHAR_T, '\0');
static constexpr CHAR_T Space = CMK_LC(CHAR_T, ' ');
};
struct LS { // literal string
// can't seem to avoid having to specify the size
static constexpr CHAR_T Space [2] = CMK_LS(CHAR_T, " ");
static constexpr CHAR_T Ellipsis [4] = CMK_LS(CHAR_T, "...");
};
};
auto char_space { StringTraits<char>::LC::Space };
auto wchar_space { StringTraits<wchar_t>::LC::Space };
auto char_ellipsis { StringTraits<char>::LS::Ellipsis }; // note: const char*
auto wchar_ellipsis { StringTraits<wchar_t>::LS::Ellipsis }; // note: const wchar_t*
auto (& char_space_array) [4] { StringTraits<char>::LS::Ellipsis };
auto (&wchar_space_array) [4] { StringTraits<wchar_t>::LS::Ellipsis };
? syntax to get a local copy ?
Admittedly, the syntax to preserve the string literal array type is a bit of a burden, but not overly so.
Again, only works for literals that have the same # of code units in all char type representations.
If you want LiteralStr to support all literals for all types would likely need to pass pointers as param and return CHAR_T* instead of CHAR_T(&)[SIZE]. Don't think can get LiteralChar to support multibyte char.
[EDIT]
Applying Louis Semprini SIZE support to LiteralStr gives:
template<typename CHAR_T,
size_t SIZE_A, size_t SIZE_W, size_t SIZE_U8, size_t SIZE_U16, size_t SIZE_U32,
size_t SIZE_R =
std::is_same_v<CHAR_T, char> ? SIZE_A :
std::is_same_v<CHAR_T, wchar_t> ? SIZE_W :
std::is_same_v<CHAR_T, char8_t> ? SIZE_U8 :
std::is_same_v<CHAR_T, char16_t> ? SIZE_U16 :
std::is_same_v<CHAR_T, char32_t> ? SIZE_U32 : 0
>
constexpr
auto LiteralStr(
const char (&A) [SIZE_A],
const wchar_t (&W) [SIZE_W],
const char8_t (&U8) [SIZE_U8],
const char16_t (&U16) [SIZE_U16],
const char32_t (&U32) [SIZE_U32]
) -> const CHAR_T(&)[SIZE_R]
{
if constexpr( std::is_same_v<CHAR_T, char> ) return A;
else if constexpr( std::is_same_v<CHAR_T, wchar_t> ) return W;
else if constexpr( std::is_same_v<CHAR_T, char8_t> ) return U8;
else if constexpr( std::is_same_v<CHAR_T, char16_t> ) return U16;
else if constexpr( std::is_same_v<CHAR_T, char32_t> ) return U32;
}
It is also possible to use a simpler syntax to create variables;
for example, in StringTraits::LS can change to constexpr auto &
so
static constexpr CHAR_T Ellipsis [4] = CMK_LS(CHAR_T, "...");
becomes
static constexpr auto & Ellipsis { CMK_LS(CHAR_T, "...") };
When using CMK_LS(char, "literal") any invalid char in literal are converted to '?' by VS 2019, not sure what other compilers do.

How to concatenate static strings at compile time?

I am trying to use templates to create an analogue of the type_info::name() function which emits the const-qualified name. E.g. typeid(bool const).name() is "bool" but I want to see "bool const". So for generic types I define:
template<class T> struct type_name { static char const *const _; };
template<class T> char const *const type_name<T>::_ = "type unknown";
char const *const type_name<bool>::_ = "bool";
char const *const type_name<int>::_ = "int";
//etc.
Then type_name<bool>::_ is "bool". For non-const types obviously I could add a separate definition for each type, so char const *const type_name<bool const>::_ = "bool const"; etc. But I thought I would try a partial specialization and a concatenation macro to derive in one line the const-qualified name for any type which has its non-const-qualified name previously defined. So
#define CAT(A, B) A B
template<class T> char const *const type_name<T const>::_
= CAT(type_name<T>::_, " const"); // line [1]
But then type_name<bool const>::_ gives me error C2143: syntax error: missing ';' before 'string' for line [1]. I think that type_name<bool>::_ is a static string known at compile time, so how do I get it concatenated with " const" at compile time?
I tried more simple example but same problem:
char str1[4] = "int";
char *str2 = MYCAT(str1, " const");
I recently revisited this problem, and found that the previous answer I gave produced ridiculously long compile times when concatenating more than a handful of strings.
I have produced a new solution which leverages constexpr functions to remove the recursive templates responsible for the long compilation time.
#include <array>
#include <iostream>
#include <string_view>
template <std::string_view const&... Strs>
struct join
{
// Join all strings into a single std::array of chars
static constexpr auto impl() noexcept
{
constexpr std::size_t len = (Strs.size() + ... + 0);
std::array<char, len + 1> arr{};
auto append = [i = 0, &arr](auto const& s) mutable {
for (auto c : s) arr[i++] = c;
};
(append(Strs), ...);
arr[len] = 0;
return arr;
}
// Give the joined string static storage
static constexpr auto arr = impl();
// View as a std::string_view
static constexpr std::string_view value {arr.data(), arr.size() - 1};
};
// Helper to get the value out
template <std::string_view const&... Strs>
static constexpr auto join_v = join<Strs...>::value;
// Hello world example
static constexpr std::string_view hello = "hello";
static constexpr std::string_view space = " ";
static constexpr std::string_view world = "world";
static constexpr std::string_view bang = "!";
// Join them all together
static constexpr auto joined = join_v<hello, space, world, bang>;
int main()
{
std::cout << joined << '\n';
}
This gives much quicker compile times, even with a large quantity of strings to concatenate.
I personally find this solution easier to follow as the constexpr impl function is akin to how this could be solved at runtime.
Edited with improvements thanks to #Jarod42
EDIT - See my new, improved answer here.
Building on #Hededes answer, if we allow recursive templates, then concatenation of many strings can be implemented as:
#include <string_view>
#include <utility>
#include <iostream>
namespace impl
{
/// Base declaration of our constexpr string_view concatenation helper
template <std::string_view const&, typename, std::string_view const&, typename>
struct concat;
/// Specialisation to yield indices for each char in both provided string_views,
/// allows us flatten them into a single char array
template <std::string_view const& S1,
std::size_t... I1,
std::string_view const& S2,
std::size_t... I2>
struct concat<S1, std::index_sequence<I1...>, S2, std::index_sequence<I2...>>
{
static constexpr const char value[]{S1[I1]..., S2[I2]..., 0};
};
} // namespace impl
/// Base definition for compile time joining of strings
template <std::string_view const&...> struct join;
/// When no strings are given, provide an empty literal
template <>
struct join<>
{
static constexpr std::string_view value = "";
};
/// Base case for recursion where we reach a pair of strings, we concatenate
/// them to produce a new constexpr string
template <std::string_view const& S1, std::string_view const& S2>
struct join<S1, S2>
{
static constexpr std::string_view value =
impl::concat<S1,
std::make_index_sequence<S1.size()>,
S2,
std::make_index_sequence<S2.size()>>::value;
};
/// Main recursive definition for constexpr joining, pass the tail down to our
/// base case specialisation
template <std::string_view const& S, std::string_view const&... Rest>
struct join<S, Rest...>
{
static constexpr std::string_view value =
join<S, join<Rest...>::value>::value;
};
/// Join constexpr string_views to produce another constexpr string_view
template <std::string_view const&... Strs>
static constexpr auto join_v = join<Strs...>::value;
namespace str
{
static constexpr std::string_view a = "Hello ";
static constexpr std::string_view b = "world";
static constexpr std::string_view c = "!";
}
int main()
{
constexpr auto joined = join_v<str::a, str::b, str::c>;
std::cout << joined << '\n';
return 0;
}
I used c++17 with std::string_view as the size method is handy, but this could easily be adapted to use const char[] literals as #Hedede did.
This answer is intended as a response to the title of the question, rather than the more niche problem described.

literal charT array

I'm working on some API for algorithm involving text.
I would like to make it NOT dependent on the character type (char,wchar_t...), so I have made template classes with a template parameter CharT.
These classes use std::basic_string<CharT>.
I have to initialize a lot of basic_string with default values.
If CharT is char I can affect the literal "default_text", or if CharT is wchar_t I can affect L"default_text", but this is not generic (it is CharT dependant).
Do you think of any way to initialize the basic_string with a generic method ?
If that may help, my code is in C++11.
Since your code is generic, I guess that the literal you have only contains ASCII characters. Otherwise, you'd have to transcode it on the fly which is going to be a lot of hassle. In order to promote a pure-ASCII string literal of type char[] to another character type, you can simply promote each character individually.
If you're going to initialize a std::basic_string anyway, you can as well do it right away. The following function takes a char[] string literal and a target type and promotes it to a string of that type.
template <typename CharT>
auto
as_string(const char *const text)
{
const auto length = std::strlen(text);
auto string = std::basic_string<CharT> {};
string.resize(length);
for (auto i = std::size_t {}; i < length; ++i)
string[i] = CharT {text[i]};
return string;
}
It can be used like this.
std::cout << as_string<char>("The bats are in the belfry") << '\n';
std::wcout << as_string<wchar_t>("The dew is on the moor") << L'\n';
But you've asked for a character array, not a std::basic_string. In C++14, constexpr can help a lot with this. Be warned that you'd need the most recent compilers for this to be supported well.
The first thing we'll have to do is rolling our own version of std::array that provides constexpr operations. You can get as fancy as you want to but I'll keep it simple here.
template <typename T, std::size_t N>
struct array { T data[N]; };
Next, we also need a constexpr version of std::strlen.
template <typename CharT>
constexpr auto
cstrlen(const CharT *const text) noexcept
{
auto length = std::size_t {};
for (auto s = text; *s != CharT {0}; ++s)
++length;
return length;
}
Now we can write a constexpr function that promotes us a string literal.
template <typename CharT, std::size_t Length>
constexpr auto
as_array(const char *const text)
{
auto characters = array<CharT, Length + 1> {};
if (cstrlen(text) != Length)
throw std::invalid_argument {"Don't lie about the length!"};
for (auto i = std::size_t {}; i < Length; ++i)
characters.data[i] = text[i];
characters.data[Length] = CharT {0};
return characters;
}
It might be convenient to wrap it into a macro. I'm sorry for that.
#define AS_ARRAY(Type, Text) as_array<Type, cstrlen(Text)>(Text).data
It can be used like this.
std::cout << AS_ARRAY(char, "The bats are in the belfry") << '\n';
std::wcout << AS_ARRAY(wchar_t, "The dew is on the moor") << L'\n';