constexpr conversion of hex chars to std::string - c++

I have a number of strings like this:
"343536"_hex
that I would like to convert into their corresponding byte strings. I am using C++11and have defined a user-defined string literals to convert these into hex strings. However, the conversion I currently have cannot be evaluated as a constexpr which is what I'm seeking. In particular I would like to use something like this, but as a constexpr:
std::string operator "" _hex(const char *s, std::size_t slen )
{
std::string str;
str.reserve(slen);
char ch[3];
unsigned long num;
ch[2] = '\0';
for ( ; slen; slen -= 2, s += 2) {
ch[0] = s[0];
ch[1] = s[1];
num = strtoul(ch, NULL, 16);
str.push_back(num);
}
return str;
}
Test driver
int main()
{
std::string src{"653467740035"_hex};
for (const auto &ch : src)
std::cout << std::hex << std::setw(2) << std::setfill('0')
<< (unsigned)ch << '\n';
}
Sample output
65
34
67
74
00
35
The question
To be very, very clear about what I'm asking, it's this:
How can I write a C++11 string literal conversion of this type that can be evaluated at compile time as a constexpr?

In order to achieve what you are trying to do, you will need to have some compile-time string class that is compatible with constexpr. There isn't such a standard thing though. I can see some things that approach it:
boost::mpl::string, but the interface is not really pretty.
boost::log::string_literal, which has the interface you want but lacks the constexpr support.
std::string_literal, which is exactly what you are looking for, but which isn't implemented. It can probably be implementable in C++11 though if you have some free time.
In order to simplify all of this, let's use an overly simplified string_literal class. Note that some of the classes described above have a trailing \0 to be closer to std::string, but we won't bother to add one.
template<std::size_t N>
struct string_literal
{
char data[N];
};
We will also provide operator+ for concatenation. It would take some time to explain how it works and it's not really relevant for the question. Let's say that it is just some template wizardry (std::integer_sequence is a C++14 utility but can be implemented in C++11):
template<std::size_t N1, std::size_t N2, std::size_t... Ind1, std::size_t... Ind2>
constexpr auto concatenate(string_literal<N1> lhs, string_literal<N2> rhs,
std::index_sequence<Ind1...>, std::index_sequence<Ind2...>)
-> string_literal<N1+N2>
{
return { lhs.data[Ind1]... , rhs.data[Ind2]... };
}
template<std::size_t N1, std::size_t N2>
constexpr auto operator+(string_literal<N1> lhs, string_literal<N2> rhs)
-> string_literal<N1+N2>
{
using Indices1 = std::make_index_sequence<N1>;
using Indices2 = std::make_index_sequence<N2>;
return concatenate(lhs, rhs, Indices1{}, Indices2{});
}
You can use the template user-defined literal (with char...) to get rid of the string literal and have a prettier literal (1234_hex instead of "1234"_hex):
template<char... Chars>
auto operator "" _hex()
-> string_literal<sizeof...(Chars)/2>
{
return process<Chars...>();
}
Now, all you need is a function that can process your characters by pairs. A generic one and an overload for the "finish" condition. Note that the enable_if_t is needed to avoid ambiguous function calls (that's C++14, but you can replace it by typename std::enable_if<...>::type in C++11). The "real" work of converting the characters to the equivalent numbers is done in the process overload that only takes two template arguments.
template<char C1, char C2>
constexpr auto process()
-> string_literal<1>
{
return { 16 * (C1 - '0') + (C2 - '0') };
}
template<char C1, char C2, char... Rest,
typename = std::enable_if_t< (sizeof...(Rest) > 0), void >>
constexpr auto process()
-> string_literal<sizeof...(Rest)/2 + 1>
{
return process<C1, C2>() + process<Rest...>();
}
You could add many more checks to ensure that there is always an even number of characters or to ensure that there aren't any bad characters. The code I provided uses some features from the C++14 standard library, but I made sure to only use features that can be easily reimplemented in C++11 if needed. Note that you could probably write a more human-readable program with C++14 thanks to the relaxed restrictions on constexpr functions.
Here is a working C++14 example with all the aforementioned functions and classes. I made sure that your test program still works (I just replaced scr by scr.data in the loop since we use an edulcorated string_literal class).

Related

C++17 - constexpr byte iteration for trivial types

Given my prototype for a simple hashing method:
template <typename _iterator>
constexpr uint32_t hash_impl(_iterator buf, size_t len);
Generating constexpr hashes for simple strings is pretty trivial:
template <char const* str>
constexpr uint32_t generate()
{
constexpr std::string_view strView = str;
return hash_impl(str, strView.size());
}
constexpr uint32_t generate(const std::string_view& str)
{
return hash_impl(str.begin(), str.size());
}
constexpr static char str1[] = "Hello World!";
constexpr uint32_t hash1 = generate<str1>();
constexpr std::string_view str2("Hello World!");
constexpr uint32_t hash2 = generate(str2);
I'd also like to generate constexpr hashes for a variety of simple (POD and trivial structs) types too. However, I'm not sure how to get the byte representation of these types in a constexpr-friendly way.
My naive implementation:
template <typename T, typename = std::enable_if_t< std::is_standard_layout_v<T> && !std::is_pointer_v<T> >>
constexpr uint32_t generate(const T& value)
{
return hash_impl(reinterpret_cast<const std::byte*>(&value), sizeof(T));
}
falls over because &value and reinterpret_cast<> breaks constexpr rules. I've searched for a workaround, but other answers on the site indicate that it's not possible.
Worst case scenario, I can manually check for and hash specific types as so:
template <typename T, typename = std::enable_if_t< std::is_standard_layout_v<T> && !std::is_pointer_v<T> >>
constexpr uint32_t generate(const T& value)
{
if constexpr (std::is_same_v<T, int32_t> || std::is_same_v<T, uint32_t>)
{
char buf[] =
{
static_cast<char>(value >> 0),
static_cast<char>(value >> 8),
static_cast<char>(value >> 16),
static_cast<char>(value >> 24)
};
return generate(buf, 4);
}
else if constexpr (/* etc... */)
// ...
}
but this falls over as soon as I try to implement this for something like a float (which has no bitwise operators) or for trivial custom types (eg, struct foo { int a; }; unless I write an extra code block for foo).
I feel like I must be overlooking something simple, but I can't find any stl utility or think of any fancy template trick that would suit. Perhaps there's something in C++20 or C++23 that I'm not aware of?
It is not possible in C++17. In C++20 you can use std::bit_cast to get the object representation of a trivially copyable type:
auto object_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
(Technically you can argue about whether or not it is guaranteed that std::array has no additional padding/members that would make this ill-formed, but in practice that is not a concern.)
You can then pass the array as a pointer/size pair to the hash implementation. You should probably also add
static_assert(std::has_unique_object_representations_v<T>);
If the assertion fails you will have no guarantee that objects with same value will have the same object representation and same hash.
Btw. std::is_standard_layout_v is not the property you need here. The property that you need to verify is std::is_trivially_copyable_v. Being standard-layout is neither sufficient nor necessary to be able to inspect and use the object representation like this.

Check length of string literal at compile time

I would like to check length of my string literals at compile time. For now I am thinking about the following construction, but can't complete it:
#define STR(s) (sizeof(s) < 10 ? s : /* somehow perform static_assert */)
void foo(const char* s) {}
int main() {
foo(STR("abc")); // foo("abc")
foo(STR("abcabcabcabc")); // compile time error: "String exceeds 10 bytes!"
}
This is C++, where there are superior options to macros. A template can give you the exact semantics your want.
template<std::size_t N>
constexpr auto& STR(char const (&s)[N]) {
static_assert(N < 10, "String exceeds 10 bytes!");
// < 11 if you meant 10 characters. There is a trailing `\0`
// in every literal, even if we don't explicitly specify it
return s;
}
The array reference argument will bind to string literals, not pointers (that can trip your macro), deduce their size, and perform the check in the body of the function. Then it will return the reference unchanged if everything checks out, allowing even for continued overload resolution.
I will add to #StoryTeller - Unslander Monica great answer,
If you need (like me) to pass and argument for string's max length, you can expand the implementation to be more generic :
template<const int MAX_LEN, std::size_t N>
constexpr auto& STR(char const (&s)[N])
{
static_assert(N < MAX_LEN, "String overflow!");
return s;
}
And if you need several known length's, you can use template specialization :
template<std::size_t N>
constexpr auto & STR16(char const (&s)[N])
{
return STR<16>(s);
}
The first function can be a generic version, where's the second can have access to project's consts.

Template code and literal strings

I am currently writing some template code where the template parameter is the char type to use. This causes a problem when referring to literal strings. I can, of course, make a struct with the strings I use but I was thinking if it would be possible to make something like:
template<typename chartype, char32_t... chars>
struct tr {
/* some magic here */
};
so that tr<char32_t,U"hello world"> would result in U"hello world"
and tr<char16_t,U"Hello world"> would result in u"hello world"
and tr<char,U"hello world"> would result in "hello world" (in UTF-8).
The magic should of course correctly translate codes above 0x10000 to lead code and follow code for char16_t and to proper 2, 3 and 4 byte codes for UTF-8
at compile time.
Problem is: how do you define a constant C-style string of a given char type using the char32_t... chars template argument? You can extract the characters
from it but how do you rebuild a new string based on the chars of the input string in template code?
Note, the preprocessor can correctly define a string such as "hello world" with suitable prefix u or U if you like but it cannot access the individual characters of the string to properly translate it.
EDIT:
Strings as template arguments are definitely possible in new C++, however,
the template argument is NOT declared as const char * or something like that:
template <char... txt>
struct foo { ... }
allows you to write foo<"hello"> as a type with the string "hello" as template argument. The problem is how to build the string from those characters.
I mean at some point you want the struct to contain a string value to return:
template <char32_t... txt>
struct foo;
template <>
struct foo<> {
static const char16_t * txt() { return ""; }
};
template <char32_t a, char32_t... rest>
struct foo<a, rest...> {
static const char16_t * txt()
{
char16_t buf[100];
int k = 0;
if (a < 0x10000) buf[k++] = a;
else {
buf[k++] = 0xd800 | ((a - 0x10000) >> 10);
buf[k++] = 0xdc00 | ((a-0x10000) & 0x3ff);
}
// copy rest of string into buf + 2..99
u16strcpy(buf + k, foo<rest...>::txt());
return buf;
}
}
Several obvious problems with this "solution", one problem is that buf only have room for 100 characters which will fail if the string is longer. but the main problem is that I wanted this to happen in compile time and this looks very much like run time code to me and is not at all what I wanted to do.
Basically I wanted something that works this way:
foo<char, "hello"> results in something that is effectively a string literal
"hello" or u8"hello".
foo<char16_t, "hello"> results in something that is effectively a string literal u"hello" and foo<char32_t, "hello"> results in something that is effectively a string literal U"hello".
The problem is when writing template code to handle various character formats and then having string literals involved. Yes, you can write a simple struct:
template <typename ChT> struct bar;
template <>
struct bar<char32_t> {
static const char32_t * txta = U"AAAA";
static const char32_t * txtb = U"BBBB";
};
and so on and bar<char16_t> has txta = u"AAAA" etc. Then refer to the strings
in your templated code by bar<T>::txta etc. However, I wish there was a way that you could specify those strings directly in templated code and the compiler would do the right thing. A templated string literal in other words.
Maybe it should be added as a feature to the language that
T<char32_t> string-literal is the same as U string-literal etc
so that you could write
template <typename ChT>
struct foo {
static const ChT * txta = T<ChT> "AAAAA";
};
and the compiler would do the right thing.
This would appear to simply not be legal, even the following is rejected (vs2017, with standard set to latest):
template<char const * ptr>
struct test
{};
void bar()
{
test<"testing"> t;
}
with the error: invalid expression as a template argument for 'ptr', and if that's not going to work trying to convert it at compile-time is a non-starter. And honestly this doesn't seem all that surprising that a pointer-to-data isn't constant enough. to be a template argument.
Here are some tools to make it work in C++17 (might be portable to C++11 and C++14):
static constexpr data member of templated class
The output literal you wish to work with needs some "storage". I suggest to instantiate a unique class template for each literal, e.g., Literal<char, 'f', 'o', 'o', '\0'>. That class can hold the data as astatic constexpr` member.
template<class C, C... cs>
struct Literal {
static_assert(sizeof...(cs) >= 1);
static constexpr C data[] = {cs...};// or use `std::array<C, sizeof...(cs)>`
};
template<class C, C... cs>
constexpr C Literal<C, cs...>::data[];
user-defined string literal to simplify syntax
Of course you wish to avoid typing, e.g., Literal<char, 'f', 'o', 'o', '\0'>. A useful tool to achieve that is the following overload for user-defined string literals.
template<class C, C... cs>
constexpr Literal<C, cs..., C('\0')> operator""_c() {// or use `auto`
return Literal<C, cs..., C('\0')>{};
}
Note how the input literal is passed as non-type template parameters to that overload. That way, it is possible to "carry the value as a type".
constexpr algorithms for re-encoding
So far, you can type "foo"_c to obtain a Literal<char, 'f', 'o', 'o', '\0'> which has a static constexpr data member yielding the same as "foo". Next you can pass that Literal<char, 'f', 'o', 'o', '\0'> to a function which returns a const char16_t(&)[4] to data of the corresponding Literal<char16_t, ..., '\0'>. The syntax could read tr<char16_t>("foo"_c).
The code that transforms a Literal<char, ...> into the corresponding Literal<char16_t, ...> may be based on constexpr algorithms as shown below.
template<
class OutChar, class InChar, InChar... cs,
std::size_t... input_indices, std::size_t... output_indices
>
constexpr auto& tr_impl(// called by `tr` with appropriate `index_sequence`s
Literal<InChar, cs...>,
std::index_sequence<input_indices...>,
std::index_sequence<output_indices...>
) {
constexpr std::size_t outsize = sizeof...(output_indices);
using Buffer = std::array<OutChar, outsize>;
constexpr Buffer buf = encode_as<OutChar, outsize>({cs...});
return Literal<OutChar, buf[output_indices]...>::data;
}
template<class OutChar, class InChar, InChar... cs>
constexpr auto& tr(Literal<InChar, cs...> literal) {
constexpr std::size_t outsize = count_as<OutChar>({cs...});
return tr_impl<OutChar>(
literal,
std::make_index_sequence<sizeof...(cs)>{},// input indices
std::make_index_sequence<outsize>{}// output indices
);
}
The remaining part would be to implement those functions count_as and encode_as.
assign to constexpr auto& in final usage
Finally you can assign to constexpr auto& to verify that type and value are equivalent to the plain string literal based on the desired character type.
static_assert(std::size(U"𝄞𝄞") == 3);
static_assert(std::size(u"𝄞𝄞") == 5);
constexpr auto& test = tr<char16_t>(U"𝄞𝄞"_c);
static_assert(std::is_same<decltype(test), const char16_t(&)[5]>{});
for(std::size_t i=0; i<std::size(u"𝄞𝄞"); ++i) {
assert(test[i] == u"𝄞𝄞"[i]);
std::cout << i << ": " << test[i] << std::endl;
}

literal charT array

I'm working on some API for algorithm involving text.
I would like to make it NOT dependent on the character type (char,wchar_t...), so I have made template classes with a template parameter CharT.
These classes use std::basic_string<CharT>.
I have to initialize a lot of basic_string with default values.
If CharT is char I can affect the literal "default_text", or if CharT is wchar_t I can affect L"default_text", but this is not generic (it is CharT dependant).
Do you think of any way to initialize the basic_string with a generic method ?
If that may help, my code is in C++11.
Since your code is generic, I guess that the literal you have only contains ASCII characters. Otherwise, you'd have to transcode it on the fly which is going to be a lot of hassle. In order to promote a pure-ASCII string literal of type char[] to another character type, you can simply promote each character individually.
If you're going to initialize a std::basic_string anyway, you can as well do it right away. The following function takes a char[] string literal and a target type and promotes it to a string of that type.
template <typename CharT>
auto
as_string(const char *const text)
{
const auto length = std::strlen(text);
auto string = std::basic_string<CharT> {};
string.resize(length);
for (auto i = std::size_t {}; i < length; ++i)
string[i] = CharT {text[i]};
return string;
}
It can be used like this.
std::cout << as_string<char>("The bats are in the belfry") << '\n';
std::wcout << as_string<wchar_t>("The dew is on the moor") << L'\n';
But you've asked for a character array, not a std::basic_string. In C++14, constexpr can help a lot with this. Be warned that you'd need the most recent compilers for this to be supported well.
The first thing we'll have to do is rolling our own version of std::array that provides constexpr operations. You can get as fancy as you want to but I'll keep it simple here.
template <typename T, std::size_t N>
struct array { T data[N]; };
Next, we also need a constexpr version of std::strlen.
template <typename CharT>
constexpr auto
cstrlen(const CharT *const text) noexcept
{
auto length = std::size_t {};
for (auto s = text; *s != CharT {0}; ++s)
++length;
return length;
}
Now we can write a constexpr function that promotes us a string literal.
template <typename CharT, std::size_t Length>
constexpr auto
as_array(const char *const text)
{
auto characters = array<CharT, Length + 1> {};
if (cstrlen(text) != Length)
throw std::invalid_argument {"Don't lie about the length!"};
for (auto i = std::size_t {}; i < Length; ++i)
characters.data[i] = text[i];
characters.data[Length] = CharT {0};
return characters;
}
It might be convenient to wrap it into a macro. I'm sorry for that.
#define AS_ARRAY(Type, Text) as_array<Type, cstrlen(Text)>(Text).data
It can be used like this.
std::cout << AS_ARRAY(char, "The bats are in the belfry") << '\n';
std::wcout << AS_ARRAY(wchar_t, "The dew is on the moor") << L'\n';

Can I mix compile time string comparison with MPL templates?

I got this compile time string comparison from another thread using constexpr and C++11 (http://stackoverflow.com/questions/5721813/compile-time-assert-for-string-equality). It works with constant strings like "OK"
constexpr bool isequal(char const *one, char const *two) {
return (*one && *two) ? (*one == *two && isequal(one + 1, two + 1))
: (!*one && !*two);
}
I am trying to use it in the following context:
static_assert(isequal(boost::mpl::c_str<boost::mpl::string<'ak'>>::value, "ak"), "should not fail");
But it gives me an compilation error of static_assert expression is not an constant integral expression.
Can I do this?
The problem is that the value member of mpl::c_str is not marked as constexpr. Until the library authors decide to include support for constexpr, you are pretty much screwed, unless you are willing to modify your Boost code (or create your own version of c_str). If you decide to do so, the modification is quite simple: you just need to locate BOOST_ROOT/boost/mpl/string.hpp and replace this
template<typename Sequence>
struct c_str
{
...
static typename Sequence::value_type const value[BOOST_MPL_LIMIT_STRING_SIZE+1]
};
template<typename Sequence>
typename Sequence::value_type const c_str<Sequence>::value[BOOST_MPL_LIMIT_STRING_SIZE+1] =
{
#define M0(z, n, data) \
mpl::aux_::deref_unless<BOOST_PP_CAT(i, n), iend>::type::value,
BOOST_PP_REPEAT(BOOST_MPL_LIMIT_STRING_SIZE, M0, ~)
#undef M0
'\0'
};
by this
template<typename Sequence>
struct c_str
{
...
static constexpr typename Sequence::value_type value[BOOST_MPL_LIMIT_STRING_SIZE+1] =
{
#define M0(z, n, data) \
mpl::aux_::deref_unless<BOOST_PP_CAT(i, n), iend>::type::value,
BOOST_PP_REPEAT(BOOST_MPL_LIMIT_STRING_SIZE, M0, ~)
#undef M0
'\0'
};
};
// definition still needed
template<typename Sequence>
constexpr typename Sequence::value_type c_str<Sequence>::value[BOOST_MPL_LIMIT_STRING_SIZE+1];
Hmm, after digging a bit more, it turns out the problem is more complex than I thought. In fact, static constants can be used in constexpr; the true problem is that c_str<T>::value is an array, and your function takes pointers as parameters. As a consequence, the compiler needs to decay the array, which boils down to taking the address of its first element. Since addresses are a runtime concept, it is not possible to take the address of an object in a constexpr.
To solve the issue, I tried to write a second version of isequal that operates on arrays rather than on pointers:
template <int N, int M>
constexpr bool isequal(char const (&one)[N], char const (&two)[M], int index)
{
return (one[index] && two[index]) ?
(one[index] == two[index] && isequal(one, two, index + 1)) :
(!one[index] && !two[index]);
}
template <int N, int M>
constexpr bool isequal(char const (&one)[N], char const (&two)[M])
{
// note: we can't check whether N == M since the size of the array
// can be greater than the actual size of the null-terminated string
return isequal(one, two, 0);
}
constexpr char hello[] = "hello";
static_assert(isequal(hello, hello), "hello == hello");
constexpr char zello[] = "zello";
static_assert(!isequal(hello, zello), "hello != zello");
constexpr char hel[] = "hel";
static_assert(!isequal(hello, hel), "hello != hel");
Unfortunately, this code does not work with mpl::c_str; in fact, the problem is that static const arrays are not compile-time values, unlike integral constants. So we're back the beginning: unless value is marked as constexpr, there is no way to use it in a constant expression.
As to why the code I gave initially fails, I can't answer right now as my version of gcc (4.6) fails to compile it altogether...
After updating gcc, it turns out value needs to be defined outside the class, even though it is declared and initialized in the class (see this question). I edited the code above with the correction.