literal charT array - c++

I'm working on some API for algorithm involving text.
I would like to make it NOT dependent on the character type (char,wchar_t...), so I have made template classes with a template parameter CharT.
These classes use std::basic_string<CharT>.
I have to initialize a lot of basic_string with default values.
If CharT is char I can affect the literal "default_text", or if CharT is wchar_t I can affect L"default_text", but this is not generic (it is CharT dependant).
Do you think of any way to initialize the basic_string with a generic method ?
If that may help, my code is in C++11.

Since your code is generic, I guess that the literal you have only contains ASCII characters. Otherwise, you'd have to transcode it on the fly which is going to be a lot of hassle. In order to promote a pure-ASCII string literal of type char[] to another character type, you can simply promote each character individually.
If you're going to initialize a std::basic_string anyway, you can as well do it right away. The following function takes a char[] string literal and a target type and promotes it to a string of that type.
template <typename CharT>
auto
as_string(const char *const text)
{
const auto length = std::strlen(text);
auto string = std::basic_string<CharT> {};
string.resize(length);
for (auto i = std::size_t {}; i < length; ++i)
string[i] = CharT {text[i]};
return string;
}
It can be used like this.
std::cout << as_string<char>("The bats are in the belfry") << '\n';
std::wcout << as_string<wchar_t>("The dew is on the moor") << L'\n';
But you've asked for a character array, not a std::basic_string. In C++14, constexpr can help a lot with this. Be warned that you'd need the most recent compilers for this to be supported well.
The first thing we'll have to do is rolling our own version of std::array that provides constexpr operations. You can get as fancy as you want to but I'll keep it simple here.
template <typename T, std::size_t N>
struct array { T data[N]; };
Next, we also need a constexpr version of std::strlen.
template <typename CharT>
constexpr auto
cstrlen(const CharT *const text) noexcept
{
auto length = std::size_t {};
for (auto s = text; *s != CharT {0}; ++s)
++length;
return length;
}
Now we can write a constexpr function that promotes us a string literal.
template <typename CharT, std::size_t Length>
constexpr auto
as_array(const char *const text)
{
auto characters = array<CharT, Length + 1> {};
if (cstrlen(text) != Length)
throw std::invalid_argument {"Don't lie about the length!"};
for (auto i = std::size_t {}; i < Length; ++i)
characters.data[i] = text[i];
characters.data[Length] = CharT {0};
return characters;
}
It might be convenient to wrap it into a macro. I'm sorry for that.
#define AS_ARRAY(Type, Text) as_array<Type, cstrlen(Text)>(Text).data
It can be used like this.
std::cout << AS_ARRAY(char, "The bats are in the belfry") << '\n';
std::wcout << AS_ARRAY(wchar_t, "The dew is on the moor") << L'\n';

Related

C++ How can I improve this bit of template meta-program to give back the array including the size?

I've got a utility called choose_literal which chooses a literal string encoded as char*, wchar_*, char8_t*, char16_t*, char32_t* depending on the desired type (the choice).
It looks like this:
template <typename T>
constexpr auto choose_literal(const char * psz, const wchar_t * wsz, const CHAR8_T * u8z, const char16_t * u16z, const char32_t * u32z) {
if constexpr (std::is_same_v<T, char>)
return psz;
if constexpr (std::is_same_v<T, wchar_t>)
return wsz;
#ifdef char8_t
if constexpr (std::is_same_v<T, char8_t>)
return u8z;
#endif
if constexpr (std::is_same_v<T, char16_t>)
return u16z;
if constexpr (std::is_same_v<T, char32_t>)
return u32z;
}
I supply a little preprocessor macro to make this work w/o having to type each of those string encodings manually:
// generates the appropriate character literal using preprocessor voodoo
// usage: LITERAL(char-type, "literal text")
#define LITERAL(T,x) details::choose_literal<T>(x, L##x, u8##x, u##x, U##x)
This of course only works for literal strings which can be encoded in the target format by the compiler - but something like an empty string can be, as can ASCII characters (i.e. a-z, 0-9, etc., which have representations in all of those encodings).
e.g. here's a trivial bit of code that will return the correct empty-string given a valid character type 'T':
template <typename T>
constexpr const T * GetBlank() {
return LITERAL(T, "");
}
This is great as far as it goes, and it works well enough in my code.
What I'd like to do is to refactor this such that I get back the character-array including its size, as if I'd written something like:
const char blank[] = "";
or
const wchar_t blank[] = L"";
Which allows the compiler to know the length of the string-literal, not just its address.
My choose_literal<T>(str) returns only the const T * rather than the const T (&)[size] which would be ideal.
In general I'd love to be able to pass such entities around intact - rather than have them devolve into just a pointer.
But in this specific case, is there a technique you might point me towards that allows me to declare a struct with a data-member for the desired encoding which then also knows its array-length?
A little bit of constexpr recursion magic allows you to return a string_view of the appropriate type.
#include <string_view>
#include <type_traits>
#include <iostream>
template <typename T, class Choice, std::size_t N, class...Rest>
constexpr auto choose_literal(Choice(& choice)[N], Rest&...rest)
{
using const_char_type = Choice;
using char_type = std::remove_const_t<const_char_type>;
if constexpr (std::is_same_v<T, char_type>)
{
constexpr auto extent = N;
return std::basic_string_view<char_type>(choice, extent - 1);
}
else
{
return choose_literal<T>(rest...);
}
}
int main()
{
auto clit = choose_literal<char>("hello", L"hello");
std::cout << clit;
auto wclit = choose_literal<wchar_t>("hello", L"hello");
std::wcout << wclit;
}
https://godbolt.org/z/4roZ_O
If it were me, I'd probably want to wrap this and other functions into a constexpr class which offers common services like printing the literal in the correct form depending on the stream type, and creating the correct kind of string from the literal.
For example:
#include <string_view>
#include <type_traits>
#include <iostream>
#include <tuple>
template <typename T, class Choice, std::size_t N, class...Rest>
constexpr auto choose_literal(Choice(& choice)[N], Rest&...rest)
{
using const_char_type = Choice;
using char_type = std::remove_const_t<const_char_type>;
if constexpr (std::is_same_v<T, char_type>)
{
constexpr auto extent = N;
return std::basic_string_view<char_type>(choice, extent - 1);
}
else
{
return choose_literal<T>(rest...);
}
}
template<class...Choices>
struct literal_chooser
{
constexpr literal_chooser(Choices&...choices)
: choices_(choices...)
{}
template<class T>
constexpr auto choose()
{
auto invoker = [](auto&...choices)
{
return choose_literal<T>(choices...);
};
return std::apply(invoker, choices_);
}
std::tuple<Choices&...> choices_;
};
template<class Char, class...Choices>
std::basic_ostream<Char>& operator<<(std::basic_ostream<Char>& os, literal_chooser<Choices...> chooser)
{
return os << chooser.template choose<Char>();
}
template<class Char, class...Choices>
std::basic_string<Char> to_string(literal_chooser<Choices...> chooser)
{
auto sview = chooser.template choose<Char>();
return std::basic_string<Char>(sview.data(), sview.size());
}
int main()
{
auto lit = literal_chooser("hello", L"hello");
std::cout << lit << std::endl;
std::wcout << lit << std::endl;
auto s1 = to_string<char>(lit);
auto s2 = to_string<wchar_t>(lit);
std::cout << s1 << std::endl;
std::wcout << s2 << std::endl;
}
The use of the reference argument type Choices& is important. C++ string literals are references to arrays of const Char. Passing by value would result in the literal being decayed into a pointer, which would lose information about the extent of the array.
we can add other services, written in terms of the literal_chooser:
template<class Char, class...Choices>
constexpr std::size_t size(literal_chooser<Choices...> chooser)
{
auto sview = chooser.template choose<Char>();
return sview.size();
}
We're going to change the function so that it takes a const T (&)[size] for each input, and the return type is going to be decltype(auto). Using decltype(auto) prevents the return from decaying into a value, preserving things like references to arrays.
Updated function:
template <typename T, size_t N1, size_t N2, size_t N3, size_t N4>
constexpr decltype(auto) choose_literal(const char (&psz)[N1], const wchar_t (&wsz)[N2], const char16_t (&u16z)[N3], const char32_t (&u32z)[N4]) {
if constexpr (std::is_same<T, char>())
return psz;
if constexpr (std::is_same<T, wchar_t>())
return wsz;
if constexpr (std::is_same<T, char16_t>())
return u16z;
if constexpr (std::is_same<T, char32_t>())
return u32z;
}
In main, we can assign the result to something of type auto&&:
#define LITERAL(T,x) choose_literal<T>(x, L##x, u##x, U##x)
int main() {
constexpr auto&& literal = LITERAL(char, "hello");
return sizeof(literal); // Returns 6
}
Potential simplification
We can simplify the choose_literal function by making it recursive, that way it can be expanded for any number of types. This works without any changes to the LITERAL macro.
template<class T, class Char, size_t N, class... Rest>
constexpr decltype(auto) choose_literal(const Char(&result)[N], Rest const&... rest) {
if constexpr(std::is_same_v<T, Char>)
return result;
else
return choose_literal<T>(rest...);
}

How to define string literal with character type that depends on template parameter?

template<typename CharType>
class StringTraits {
public:
static const CharType NULL_CHAR = '\0';
static constexpr CharType* WHITESPACE_STR = " ";
};
typedef StringTraits<char> AStringTraits;
typedef StringTraits<wchar_t> WStringTraits;
I know I could do it with template specialization, but this would require some duplication (by defining string literals with and without L prefix).
Is there a simpler way to define const/constexpr char/wchar_t and char*/wchar_t* with same string literal in a template class?
There are several ways to do this, depending on the available version of the C++ standard.
If you have C++17 available, you can scroll down to Method 3, which is the most elegant solution in my opinion.
Note: Methods 1 and 3 assume that the characters of the string literal will be restricted to 7-bit ASCII. This requires that characters are in the range [0..127] and the execution character set is compatible with 7-bit ASCII (e. g. Windows-1252 or UTF-8). Otherwise the simple casting of char values to wchar_t used by these methods won't give the correct result.
Method 1 - aggregate initialization (C++03)
The simplest way is to define an array using aggregate initialization:
template<typename CharType>
class StringTraits {
public:
static const CharType NULL_CHAR = '\0';
static constexpr CharType WHITESPACE_STR[] = {'a','b','c',0};
};
Method 2 - template specialization and macro (C++03)
(Another variant is shown in this answer.)
The aggregate initialization method can be cumbersome for long strings. For more comfort, we can use a combination of template specialization and macros:
template< typename CharT > constexpr CharT const* NarrowOrWide( char const*, wchar_t const* );
template<> constexpr char const* NarrowOrWide< char >( char const* c, wchar_t const* )
{ return c; }
template<> constexpr wchar_t const* NarrowOrWide< wchar_t >( char const*, wchar_t const* w )
{ return w; }
#define TOWSTRING1(x) L##x
#define TOWSTRING(x) TOWSTRING1(x)
#define NARROW_OR_WIDE( C, STR ) NarrowOrWide< C >( ( STR ), TOWSTRING( STR ) )
Usage:
template<typename CharType>
class StringTraits {
public:
static constexpr CharType const* WHITESPACE_STR = NARROW_OR_WIDE( CharType, " " );
};
Live Demo at Coliru
Explanation:
The template function NarrowOrWide() returns either the first (char const*) or the second (wchar_t const*) argument, depending on template parameter CharT.
The macro NARROW_OR_WIDE is used to avoid having to write both the narrow and the wide string literal. The macro TOWSTRING simply prepends the L prefix to the given string literal.
Of course the macro will only work if the range of characters is limited to basic ASCII, but this is usually sufficient. Otherwise one can use the NarrowOrWide() template function to define narrow and wide string literals separately.
Notes:
I would add a "unique" prefix to the macro names, something like the name of your library, to avoid conflicts with similar macros defined elsewhere.
Method 3 - array initialized via template parameter pack (C++17)
C++17 finally allows us to get rid of the macro and use a pure C++ solution. The solution uses template parameter pack expansion to initialize an array from a string literal while static_casting the individual characters to the desired type.
First we declare a str_array class, which is similar to std::array but tailored for constant null-terminated string (e. g. str_array::size() returns number of characters without '\0', instead of buffer size). This wrapper class is necessary, because a plain array cannot be returned from a function. It must be wrapped in a struct or class.
template< typename CharT, std::size_t Length >
struct str_array
{
constexpr CharT const* c_str() const { return data_; }
constexpr CharT const* data() const { return data_; }
constexpr CharT operator[]( std::size_t i ) const { return data_[ i ]; }
constexpr CharT const* begin() const { return data_; }
constexpr CharT const* end() const { return data_ + Length; }
constexpr std::size_t size() const { return Length; }
// TODO: add more members of std::basic_string
CharT data_[ Length + 1 ]; // +1 for null-terminator
};
So far, nothing special. The real trickery is done by the following str_array_cast() function, which initializes the str_array from a string literal while static_casting the individual characters to the desired type:
#include <utility>
namespace detail {
template< typename ResT, typename SrcT >
constexpr ResT static_cast_ascii( SrcT x )
{
if( !( x >= 0 && x <= 127 ) )
throw std::out_of_range( "Character value must be in basic ASCII range (0..127)" );
return static_cast<ResT>( x );
}
template< typename ResElemT, typename SrcElemT, std::size_t N, std::size_t... I >
constexpr str_array< ResElemT, N - 1 > do_str_array_cast( const SrcElemT(&a)[N], std::index_sequence<I...> )
{
return { static_cast_ascii<ResElemT>( a[I] )..., 0 };
}
} //namespace detail
template< typename ResElemT, typename SrcElemT, std::size_t N, typename Indices = std::make_index_sequence< N - 1 > >
constexpr str_array< ResElemT, N - 1 > str_array_cast( const SrcElemT(&a)[N] )
{
return detail::do_str_array_cast< ResElemT >( a, Indices{} );
}
The template parameter pack expansion trickery is required, because constant arrays can only be initialized via aggregate initialization (e. g. const str_array<char,3> = {'a','b','c',0};), so we have to "convert" the string literal to such an initializer list.
The code triggers a compile time error if any character is outside of basic ASCII range (0..127), for the reasons given at the beginning of this answer. There are code pages where 0..127 doesn't map to ASCII, so this check does not give 100% safety though.
Usage:
template< typename CharT >
struct StringTraits
{
static constexpr auto WHITESPACE_STR = str_array_cast<CharT>( "abc" );
// Fails to compile (as intended), because characters are not basic ASCII.
//static constexpr auto WHITESPACE_STR1 = str_array_cast<CharT>( "Àâü" );
};
Live Demo at Coliru
Here is a refinement of the now-common template-based solution which
preserves the array[len] C++ type of the C strings rather than decaying them to pointers, which means you can call sizeof() on the result and get the size of the string+NUL, not the size of a pointer, just as if you had the original string there.
Works even if the strings in different encodings have different length in code units (which is virtually guaranteed if the strings have non-ASCII text).
Does not incur any runtime overhead nor does it attempt/need to do encoding conversion at runtime.
Credit: This refinement starts with the original template idea from Mark Ransom and #2 from zett42 and borrows some ideas from, but fixes the size limitations of, Chris Kushnir's answer.
This code does char and wchar_t but it is trivial to extend it to char8_t+char16_t+char32_t
// generic utility for C++ pre-processor concatenation
// - avoids a pre-processor issue if x and y have macros inside
#define _CPP_CONCAT(x, y) x ## y
#define CPP_CONCAT(x, y) _CPP_CONCAT(x, y)
// now onto stringlit()
template<size_t SZ0, size_t SZ1>
constexpr
auto _stringlit(char c,
const char (&s0) [SZ0],
const wchar_t (&s1) [SZ1]) -> const char(&)[SZ0]
{
return s0;
}
template<size_t SZ0, size_t SZ1>
constexpr
auto _stringlit(wchar_t c,
const char (&s0) [SZ0],
const wchar_t (&s1) [SZ1]) -> const wchar_t(&)[SZ1]
{
return s1;
}
#define stringlit(code_unit, lit) \
_stringlit(code_unit (), lit, CPP_CONCAT(L, lit))
Here we are not using C++ overloading but rather defining one function per char encoding, each function with different signatures. Each function returns the original array type with the original bounds. The selector that chooses the appropriate function is a single character in the desired encoding (value of that character not important). We cannot use the type itself in a template parameter to select because then we'd be overloading and have conflicting return types. This code also works without the constexpr. Note we are returning a reference to an array (which is possible in C++) not an array (which is not allowed in C++). The use of trailing return type syntax here is optional, but a heck of a lot more readable than the alternative, something like const char (&stringlit(...params here...))[SZ0] ugh.
I compiled this with clang 9.0.8 and MSVC++ from Visual Studio 2019 16.7 (aka _MSC_VER 1927 aka pdb ver 14.27). I had c++2a/c++latest enabled, but I think C++14 or 17 is sufficient for this code.
Enjoy!
Here's an alternative implementation based on #zett42 's answer. Please advise me.
#include <iostream>
#include <tuple>
#define TOWSTRING_(x) L##x
#define TOWSTRING(x) TOWSTRING_(x)
#define MAKE_LPCTSTR(C, STR) (std::get<const C*>(std::tuple<const char*, const wchar_t*>(STR, TOWSTRING(STR))))
template<typename CharType>
class StringTraits {
public:
static constexpr const CharType* WHITESPACE_STR = MAKE_LPCTSTR(CharType, "abc");
};
typedef StringTraits<char> AStringTraits;
typedef StringTraits<wchar_t> WStringTraits;
int main(int argc, char** argv) {
std::cout << "Narrow string literal: " << AStringTraits::WHITESPACE_STR << std::endl;
std::wcout << "Wide string literal : " << WStringTraits::WHITESPACE_STR << std::endl;
return 0;
}
I've just came up with a compact answer, which is similar to other C++17 versions. Similarly, it relies on implementation defined behavior, specifically on the environment character types. It supports converting ASCII and ISO-8859-1 to UTF-16 wchar_t, UTF-32 wchar_t, UTF-16 char16_t and UTF-32 char32_t. UTF-8 input is not supported, but more elaborate conversion code is feasible.
template <typename Ch, size_t S>
constexpr auto any_string(const char (&literal)[S]) -> const array<Ch, S> {
array<Ch, S> r = {};
for (size_t i = 0; i < S; i++)
r[i] = literal[i];
return r;
}
Full example follows:
$ cat any_string.cpp
#include <array>
#include <fstream>
using namespace std;
template <typename Ch, size_t S>
constexpr auto any_string(const char (&literal)[S]) -> const array<Ch, S> {
array<Ch, S> r = {};
for (size_t i = 0; i < S; i++)
r[i] = literal[i];
return r;
}
int main(void)
{
auto s = any_string<char>("Hello");
auto ws = any_string<wchar_t>(", ");
auto s16 = any_string<char16_t>("World");
auto s32 = any_string<char32_t>("!\n");
ofstream f("s.txt");
f << s.data();
f.close();
wofstream wf("ws.txt");
wf << ws.data();
wf.close();
basic_ofstream<char16_t> f16("s16.txt");
f16 << s16.data();
f16.close();
basic_ofstream<char32_t> f32("s32.txt");
f32 << s32.data();
f32.close();
return 0;
}
$ c++ -o any_string any_string.cpp -std=c++17
$ ./any_string
$ cat s.txt ws.txt s16.txt s32.txt
Hello, World!
A variation of zett42 Method 2 above.
Has the advantage of supporting all char types (for literals that can be represented as char[]) and preserving the proper string literal array type.
First the template functions:
template<typename CHAR_T>
constexpr
auto LiteralChar(
char A,
wchar_t W,
char8_t U8,
char16_t U16,
char32_t U32
) -> CHAR_T
{
if constexpr( std::is_same_v<CHAR_T, char> ) return A;
else if constexpr( std::is_same_v<CHAR_T, wchar_t> ) return W;
else if constexpr( std::is_same_v<CHAR_T, char8_t> ) return U8;
else if constexpr( std::is_same_v<CHAR_T, char16_t> ) return U16;
else if constexpr( std::is_same_v<CHAR_T, char32_t> ) return U32;
}
template<typename CHAR_T, size_t SIZE>
constexpr
auto LiteralStr(
const char (&A) [SIZE],
const wchar_t (&W) [SIZE],
const char8_t (&U8) [SIZE],
const char16_t (&U16)[SIZE],
const char32_t (&U32)[SIZE]
) -> const CHAR_T(&)[SIZE]
{
if constexpr( std::is_same_v<CHAR_T, char> ) return A;
else if constexpr( std::is_same_v<CHAR_T, wchar_t> ) return W;
else if constexpr( std::is_same_v<CHAR_T, char8_t> ) return U8;
else if constexpr( std::is_same_v<CHAR_T, char16_t> ) return U16;
else if constexpr( std::is_same_v<CHAR_T, char32_t> ) return U32;
}
Then the macros:
#define CMK_LC(CHAR_T, LITERAL) \
LiteralChar<CHAR_T>( LITERAL, L ## LITERAL, u8 ## LITERAL, u ## LITERAL, U ## LITERAL )
#define CMK_LS(CHAR_T, LITERAL) \
LiteralStr<CHAR_T>( LITERAL, L ## LITERAL, u8 ## LITERAL, u ## LITERAL, U ## LITERAL )
Then use:
template<typename CHAR_T>
class StringTraits {
public:
struct LC { // literal character
static constexpr CHAR_T Null = CMK_LC(CHAR_T, '\0');
static constexpr CHAR_T Space = CMK_LC(CHAR_T, ' ');
};
struct LS { // literal string
// can't seem to avoid having to specify the size
static constexpr CHAR_T Space [2] = CMK_LS(CHAR_T, " ");
static constexpr CHAR_T Ellipsis [4] = CMK_LS(CHAR_T, "...");
};
};
auto char_space { StringTraits<char>::LC::Space };
auto wchar_space { StringTraits<wchar_t>::LC::Space };
auto char_ellipsis { StringTraits<char>::LS::Ellipsis }; // note: const char*
auto wchar_ellipsis { StringTraits<wchar_t>::LS::Ellipsis }; // note: const wchar_t*
auto (& char_space_array) [4] { StringTraits<char>::LS::Ellipsis };
auto (&wchar_space_array) [4] { StringTraits<wchar_t>::LS::Ellipsis };
? syntax to get a local copy ?
Admittedly, the syntax to preserve the string literal array type is a bit of a burden, but not overly so.
Again, only works for literals that have the same # of code units in all char type representations.
If you want LiteralStr to support all literals for all types would likely need to pass pointers as param and return CHAR_T* instead of CHAR_T(&)[SIZE]. Don't think can get LiteralChar to support multibyte char.
[EDIT]
Applying Louis Semprini SIZE support to LiteralStr gives:
template<typename CHAR_T,
size_t SIZE_A, size_t SIZE_W, size_t SIZE_U8, size_t SIZE_U16, size_t SIZE_U32,
size_t SIZE_R =
std::is_same_v<CHAR_T, char> ? SIZE_A :
std::is_same_v<CHAR_T, wchar_t> ? SIZE_W :
std::is_same_v<CHAR_T, char8_t> ? SIZE_U8 :
std::is_same_v<CHAR_T, char16_t> ? SIZE_U16 :
std::is_same_v<CHAR_T, char32_t> ? SIZE_U32 : 0
>
constexpr
auto LiteralStr(
const char (&A) [SIZE_A],
const wchar_t (&W) [SIZE_W],
const char8_t (&U8) [SIZE_U8],
const char16_t (&U16) [SIZE_U16],
const char32_t (&U32) [SIZE_U32]
) -> const CHAR_T(&)[SIZE_R]
{
if constexpr( std::is_same_v<CHAR_T, char> ) return A;
else if constexpr( std::is_same_v<CHAR_T, wchar_t> ) return W;
else if constexpr( std::is_same_v<CHAR_T, char8_t> ) return U8;
else if constexpr( std::is_same_v<CHAR_T, char16_t> ) return U16;
else if constexpr( std::is_same_v<CHAR_T, char32_t> ) return U32;
}
It is also possible to use a simpler syntax to create variables;
for example, in StringTraits::LS can change to constexpr auto &
so
static constexpr CHAR_T Ellipsis [4] = CMK_LS(CHAR_T, "...");
becomes
static constexpr auto & Ellipsis { CMK_LS(CHAR_T, "...") };
When using CMK_LS(char, "literal") any invalid char in literal are converted to '?' by VS 2019, not sure what other compilers do.

Template code and literal strings

I am currently writing some template code where the template parameter is the char type to use. This causes a problem when referring to literal strings. I can, of course, make a struct with the strings I use but I was thinking if it would be possible to make something like:
template<typename chartype, char32_t... chars>
struct tr {
/* some magic here */
};
so that tr<char32_t,U"hello world"> would result in U"hello world"
and tr<char16_t,U"Hello world"> would result in u"hello world"
and tr<char,U"hello world"> would result in "hello world" (in UTF-8).
The magic should of course correctly translate codes above 0x10000 to lead code and follow code for char16_t and to proper 2, 3 and 4 byte codes for UTF-8
at compile time.
Problem is: how do you define a constant C-style string of a given char type using the char32_t... chars template argument? You can extract the characters
from it but how do you rebuild a new string based on the chars of the input string in template code?
Note, the preprocessor can correctly define a string such as "hello world" with suitable prefix u or U if you like but it cannot access the individual characters of the string to properly translate it.
EDIT:
Strings as template arguments are definitely possible in new C++, however,
the template argument is NOT declared as const char * or something like that:
template <char... txt>
struct foo { ... }
allows you to write foo<"hello"> as a type with the string "hello" as template argument. The problem is how to build the string from those characters.
I mean at some point you want the struct to contain a string value to return:
template <char32_t... txt>
struct foo;
template <>
struct foo<> {
static const char16_t * txt() { return ""; }
};
template <char32_t a, char32_t... rest>
struct foo<a, rest...> {
static const char16_t * txt()
{
char16_t buf[100];
int k = 0;
if (a < 0x10000) buf[k++] = a;
else {
buf[k++] = 0xd800 | ((a - 0x10000) >> 10);
buf[k++] = 0xdc00 | ((a-0x10000) & 0x3ff);
}
// copy rest of string into buf + 2..99
u16strcpy(buf + k, foo<rest...>::txt());
return buf;
}
}
Several obvious problems with this "solution", one problem is that buf only have room for 100 characters which will fail if the string is longer. but the main problem is that I wanted this to happen in compile time and this looks very much like run time code to me and is not at all what I wanted to do.
Basically I wanted something that works this way:
foo<char, "hello"> results in something that is effectively a string literal
"hello" or u8"hello".
foo<char16_t, "hello"> results in something that is effectively a string literal u"hello" and foo<char32_t, "hello"> results in something that is effectively a string literal U"hello".
The problem is when writing template code to handle various character formats and then having string literals involved. Yes, you can write a simple struct:
template <typename ChT> struct bar;
template <>
struct bar<char32_t> {
static const char32_t * txta = U"AAAA";
static const char32_t * txtb = U"BBBB";
};
and so on and bar<char16_t> has txta = u"AAAA" etc. Then refer to the strings
in your templated code by bar<T>::txta etc. However, I wish there was a way that you could specify those strings directly in templated code and the compiler would do the right thing. A templated string literal in other words.
Maybe it should be added as a feature to the language that
T<char32_t> string-literal is the same as U string-literal etc
so that you could write
template <typename ChT>
struct foo {
static const ChT * txta = T<ChT> "AAAAA";
};
and the compiler would do the right thing.
This would appear to simply not be legal, even the following is rejected (vs2017, with standard set to latest):
template<char const * ptr>
struct test
{};
void bar()
{
test<"testing"> t;
}
with the error: invalid expression as a template argument for 'ptr', and if that's not going to work trying to convert it at compile-time is a non-starter. And honestly this doesn't seem all that surprising that a pointer-to-data isn't constant enough. to be a template argument.
Here are some tools to make it work in C++17 (might be portable to C++11 and C++14):
static constexpr data member of templated class
The output literal you wish to work with needs some "storage". I suggest to instantiate a unique class template for each literal, e.g., Literal<char, 'f', 'o', 'o', '\0'>. That class can hold the data as astatic constexpr` member.
template<class C, C... cs>
struct Literal {
static_assert(sizeof...(cs) >= 1);
static constexpr C data[] = {cs...};// or use `std::array<C, sizeof...(cs)>`
};
template<class C, C... cs>
constexpr C Literal<C, cs...>::data[];
user-defined string literal to simplify syntax
Of course you wish to avoid typing, e.g., Literal<char, 'f', 'o', 'o', '\0'>. A useful tool to achieve that is the following overload for user-defined string literals.
template<class C, C... cs>
constexpr Literal<C, cs..., C('\0')> operator""_c() {// or use `auto`
return Literal<C, cs..., C('\0')>{};
}
Note how the input literal is passed as non-type template parameters to that overload. That way, it is possible to "carry the value as a type".
constexpr algorithms for re-encoding
So far, you can type "foo"_c to obtain a Literal<char, 'f', 'o', 'o', '\0'> which has a static constexpr data member yielding the same as "foo". Next you can pass that Literal<char, 'f', 'o', 'o', '\0'> to a function which returns a const char16_t(&)[4] to data of the corresponding Literal<char16_t, ..., '\0'>. The syntax could read tr<char16_t>("foo"_c).
The code that transforms a Literal<char, ...> into the corresponding Literal<char16_t, ...> may be based on constexpr algorithms as shown below.
template<
class OutChar, class InChar, InChar... cs,
std::size_t... input_indices, std::size_t... output_indices
>
constexpr auto& tr_impl(// called by `tr` with appropriate `index_sequence`s
Literal<InChar, cs...>,
std::index_sequence<input_indices...>,
std::index_sequence<output_indices...>
) {
constexpr std::size_t outsize = sizeof...(output_indices);
using Buffer = std::array<OutChar, outsize>;
constexpr Buffer buf = encode_as<OutChar, outsize>({cs...});
return Literal<OutChar, buf[output_indices]...>::data;
}
template<class OutChar, class InChar, InChar... cs>
constexpr auto& tr(Literal<InChar, cs...> literal) {
constexpr std::size_t outsize = count_as<OutChar>({cs...});
return tr_impl<OutChar>(
literal,
std::make_index_sequence<sizeof...(cs)>{},// input indices
std::make_index_sequence<outsize>{}// output indices
);
}
The remaining part would be to implement those functions count_as and encode_as.
assign to constexpr auto& in final usage
Finally you can assign to constexpr auto& to verify that type and value are equivalent to the plain string literal based on the desired character type.
static_assert(std::size(U"π„žπ„ž") == 3);
static_assert(std::size(u"π„žπ„ž") == 5);
constexpr auto& test = tr<char16_t>(U"π„žπ„ž"_c);
static_assert(std::is_same<decltype(test), const char16_t(&)[5]>{});
for(std::size_t i=0; i<std::size(u"π„žπ„ž"); ++i) {
assert(test[i] == u"π„žπ„ž"[i]);
std::cout << i << ": " << test[i] << std::endl;
}

constexpr conversion of hex chars to std::string

I have a number of strings like this:
"343536"_hex
that I would like to convert into their corresponding byte strings. I am using C++11and have defined a user-defined string literals to convert these into hex strings. However, the conversion I currently have cannot be evaluated as a constexpr which is what I'm seeking. In particular I would like to use something like this, but as a constexpr:
std::string operator "" _hex(const char *s, std::size_t slen )
{
std::string str;
str.reserve(slen);
char ch[3];
unsigned long num;
ch[2] = '\0';
for ( ; slen; slen -= 2, s += 2) {
ch[0] = s[0];
ch[1] = s[1];
num = strtoul(ch, NULL, 16);
str.push_back(num);
}
return str;
}
Test driver
int main()
{
std::string src{"653467740035"_hex};
for (const auto &ch : src)
std::cout << std::hex << std::setw(2) << std::setfill('0')
<< (unsigned)ch << '\n';
}
Sample output
65
34
67
74
00
35
The question
To be very, very clear about what I'm asking, it's this:
How can I write a C++11 string literal conversion of this type that can be evaluated at compile time as a constexpr?
In order to achieve what you are trying to do, you will need to have some compile-time string class that is compatible with constexpr. There isn't such a standard thing though. I can see some things that approach it:
boost::mpl::string, but the interface is not really pretty.
boost::log::string_literal, which has the interface you want but lacks the constexpr support.
std::string_literal, which is exactly what you are looking for, but which isn't implemented. It can probably be implementable in C++11 though if you have some free time.
In order to simplify all of this, let's use an overly simplified string_literal class. Note that some of the classes described above have a trailing \0 to be closer to std::string, but we won't bother to add one.
template<std::size_t N>
struct string_literal
{
char data[N];
};
We will also provide operator+ for concatenation. It would take some time to explain how it works and it's not really relevant for the question. Let's say that it is just some template wizardry (std::integer_sequence is a C++14 utility but can be implemented in C++11):
template<std::size_t N1, std::size_t N2, std::size_t... Ind1, std::size_t... Ind2>
constexpr auto concatenate(string_literal<N1> lhs, string_literal<N2> rhs,
std::index_sequence<Ind1...>, std::index_sequence<Ind2...>)
-> string_literal<N1+N2>
{
return { lhs.data[Ind1]... , rhs.data[Ind2]... };
}
template<std::size_t N1, std::size_t N2>
constexpr auto operator+(string_literal<N1> lhs, string_literal<N2> rhs)
-> string_literal<N1+N2>
{
using Indices1 = std::make_index_sequence<N1>;
using Indices2 = std::make_index_sequence<N2>;
return concatenate(lhs, rhs, Indices1{}, Indices2{});
}
You can use the template user-defined literal (with char...) to get rid of the string literal and have a prettier literal (1234_hex instead of "1234"_hex):
template<char... Chars>
auto operator "" _hex()
-> string_literal<sizeof...(Chars)/2>
{
return process<Chars...>();
}
Now, all you need is a function that can process your characters by pairs. A generic one and an overload for the "finish" condition. Note that the enable_if_t is needed to avoid ambiguous function calls (that's C++14, but you can replace it by typename std::enable_if<...>::type in C++11). The "real" work of converting the characters to the equivalent numbers is done in the process overload that only takes two template arguments.
template<char C1, char C2>
constexpr auto process()
-> string_literal<1>
{
return { 16 * (C1 - '0') + (C2 - '0') };
}
template<char C1, char C2, char... Rest,
typename = std::enable_if_t< (sizeof...(Rest) > 0), void >>
constexpr auto process()
-> string_literal<sizeof...(Rest)/2 + 1>
{
return process<C1, C2>() + process<Rest...>();
}
You could add many more checks to ensure that there is always an even number of characters or to ensure that there aren't any bad characters. The code I provided uses some features from the C++14 standard library, but I made sure to only use features that can be easily reimplemented in C++11 if needed. Note that you could probably write a more human-readable program with C++14 thanks to the relaxed restrictions on constexpr functions.
Here is a working C++14 example with all the aforementioned functions and classes. I made sure that your test program still works (I just replaced scr by scr.data in the loop since we use an edulcorated string_literal class).

std::end with raw arrays of char

By default, the std::end overload for raw arrays looks something like this:
template<class T, std::size_t N>
T* end(T (&array)[N])
{ return array + N; }
However, this overload was undesirable for me when passing a string literal or a char array, as they both have an implicit \0 at the end which gets counted.
I thought as a workaround I could overload end in my own namespace:
Example:
namespace unique
{
const char* end(const char (&array)[5])
{
return array + 4;
}
const char* end(const char (&array)[11])
{
return array + 10;
}
}
int main()
{
using std::begin;
using std::end;
using unique::end;
const char str1[] = "XXXTEXTXXX";
const char str2[] = "TEXT";
auto iter = std::search(begin(str1), end(str1), begin(str2), end(str2));
//...
}
However, that would require a lot of overloads to write.
QUESTION
I realize using std::string or another container would solve my problem. However, I wanted to be able to call end unqualified with a string literal or raw array and have it omit the null terminator as above. Is there a better approach that would avoid writing the overloads?
The obvious (if you're sure it'll only be used under the right circumstances) would be a variant of the original:
namespace unique {
template<class T, std::size_t N>
T* end(T (&array)[N])
{ return array + N-1; }
}
Just be sure to only use it in the right situations, or you'll end up with end that points one before the end.
If you want to restrict it to character types, you can use a couple of overloads that only work for character types:
namespace unique {
template <std::size_t N>
char *end(char (&array)[N])
{
return array + N - 1;
}
template <std::size_t N>
wchar_t *end(wchar_t (&array)[N])
{
return array + N - 1;
}
}
With this, an array of char will use the version that assumes a NUL terminator, but an array of int will use std::end so it refers to the entire array:
int main()
{
using std::begin;
using std::end;
using unique::end;
char str1 [] = "12345";
wchar_t str2 [] = L"12345";
int i4 [] = { 1, 2, 3, 4, 5, 6 };
std::cout << std::distance(begin(str1), end(str1)) << "\n";
std::cout << std::distance(begin(str2), end(str2)) << "\n";
std::cout << std::distance(begin(i4), end(i4)) << "\n";
}
Do note, however, that since there's an existing template named begin, these overloads will only match exact types, so if you want them to work with const char and const wchar_t (for example) those will require separate overloads from the ones above that work with non-const types.
Also note that these will still apply to typedefs, so (for example) the fairly common:
typedef char small_int;
...can/will lead to problems -- the real type is still char, so end(my_small_int_array) will use the char overload instead of the base template.