Detecting string literals at compile time [duplicate] - c++

I have a class to wrap string literals and calculate the size at compile time.
The constructor looks like this:
template< std::size_t N >
Literal( const char (&literal)[N] );
// used like this
Literal greet( "Hello World!" );
printf( "%s, length: %d", greet.c_str(), greet.size() );
There is problem with the code however. The following code compiles and I would like to make it an error.
char broke[] = { 'a', 'b', 'c' };
Literal l( broke );
Is there a way to restrict the constructor so that it only accepts c string literals? Compile time detection is preferred, but runtime is acceptable if there is no better way.

There is a way to force a string literal argument: make a user defined literal operator. You can make the operator constexpr to get the size at compile time:
constexpr Literal operator "" _suffix(char const* str, size_t len) {
return Literal(chars, len);
}
I don't know of any compiler that implements this feature at this time.

Yes. You can generate compile time error with following preprocessor:
#define IS_STRING_LITERAL(X) "" X ""
If you try to pass anything other than a string literal, the compilation will fail. Usage:
Literal greet(IS_STRING_LITERAL("Hello World!")); // ok
Literal greet(IS_STRING_LITERAL(broke)); // error

With a C++11 compiler with full support for constexpr we can use a constexpr constructor using a constexpr function, which compiles to a non-const expression body in case the trailing zero character precondition is not fulfilled, causing the compilation to fail with an error. The following code expands the code of UncleBens and is inspired by an article of Andrzej's C++ blog:
#include <cstdlib>
class Literal
{
public:
template <std::size_t N> constexpr
Literal(const char (&str)[N])
: mStr(str),
mLength(checkForTrailingZeroAndGetLength(str[N - 1], N))
{
}
template <std::size_t N> Literal(char (&str)[N]) = delete;
private:
const char* mStr;
std::size_t mLength;
struct Not_a_CString_Exception{};
constexpr static
std::size_t checkForTrailingZeroAndGetLength(char ch, std::size_t sz)
{
return (ch) ? throw Not_a_CString_Exception() : (sz - 1);
}
};
constexpr char broke[] = { 'a', 'b', 'c' };
//constexpr Literal lit = (broke); // causes compile time error
constexpr Literal bla = "bla"; // constructed at compile time
I tested this code with gcc 4.8.2. Compilation with MS Visual C++ 2013 CTP failed, as it still does not fully support constexpr (constexpr member functions still not supported).
Probably I should mention, that my first (and preferred) approach was to simply insert
static_assert(str[N - 1] == '\0', "Not a C string.")
in the constructor body. It failed with a compilation error and it seems, that constexpr constructors must have an empty body. I don't know, if this is a C++11 restriction and if it might be relaxed by future standards.

No there is no way to do this. String literals have a particular type and all method overload resolution is done on that type, not that it's a string literal. Any method which accepts a string literal will end up accepting any value which has the same type.
If your function absolutely depends on an item being a string literal to function then you probably need to revisit the function. It's depending on data it can't guarantee.

A string literal does not have a separate type to distinguish it from a const char array.
This, however, will make it slightly harder to accidentally pass (non-const) char arrays.
#include <cstdlib>
struct Literal
{
template< std::size_t N >
Literal( const char (&literal)[N] ){}
template< std::size_t N >
Literal( char (&literal)[N] ) = delete;
};
int main()
{
Literal greet( "Hello World!" );
char a[] = "Hello world";
Literal broke(a); //fails
}
As to runtime checking, the only problem with a non-literal is that it may not be null-terminated? As you know the size of the array, you can loop over it (preferable backwards) to see if there's a \0 in it.

I once came up with a C++98 version that uses an approach similar to the one proposed by #k.st. I'll add this for the sake of completeness to address some of the critique wrt the C++98 macro.
This version tries to enforce good behavior by preventing direct construction via a private ctor and moving the only accessible factory function into a detail namespace which in turn is used by the "offical" creation macro. Not exactly pretty, but a bit more fool proof. This way, users have to at least explicitly use functionality that is obviously marked as internal if they want to misbehave. As always, there is no way to protect against intentional malignity.
class StringLiteral
{
private:
// Direct usage is forbidden. Use STRING_LITERAL() macro instead.
friend StringLiteral detail::CreateStringLiteral(const char* str);
explicit StringLiteral(const char* str) : m_string(str)
{}
public:
operator const char*() const { return m_string; }
private:
const char* m_string;
};
namespace detail {
StringLiteral CreateStringLiteral(const char* str)
{
return StringLiteral(str);
}
} // namespace detail
#define STRING_LITERAL_INTERNAL(a, b) detail::CreateStringLiteral(a##b)
/**
* \brief The only way to create a \ref StringLiteral "StringLiteral" object.
* This will not compile if used with anything that is not a string literal.
*/
#define STRING_LITERAL(str) STRING_LITERAL_INTERNAL(str, "")

Related

When does the prefix L matters for char or wchar_t literal?

Most times that I forget the L prefix for a wchar_t literal, or that I put one by mistake for a char literal, it seems that compilation (g++) doesn't complain (no error, no warning) and that my program acts as intended.
for example
char cString[] = "Hello World!";
*std::strchr(cString, L'W') = L'w';
std::cout << cString << std::endl;
and
wchar_t cWideString[] = L"Hello World!";
*std::wcschr(cWideString, 'W') = 'w';
std::wcout << cWideString << std::endl;
both work.
Is it because, in this case, 'W' and 'w' are single-byte characters?
I'm interested, because I would like to use this on purpose, for functions like:
template<typename T> T* findNextSpace(T* str);
intended to be used for for T equal to char, const char, wchar_t, or const wchar_t. Would it be safe to use ' ' in the definition of such a function, for any character type T?
Or should I use something like (T) in order to cast the literal to the correct type?
EDIT: I know it makes a difference for char* and wchar_t*, but my question is not about string literals, it is about character literals.
Both cases fall under the rules for implicit conversions. Specifically the rules for integral conversions. The implicit conversion of char to whar_t should be well defined for all cases, as wchar_t is at least as wide as a char. Your compiler might warn you of the risk of data loss in the conversion from wchar_t to char, if the value of the wchar_t is (or can be in case the value isn't known at compile time) outside the range representable by char.
From my understanding the integer conversion rules applies to prvalue expressions of an integer type, characters are of integer type, and literals (except string literals) are prvalues, so this is why you see the behavior you see with the character literals, while you don't see it with string literals.
I don't have a version of the standard that I can look things up in, but I understand that cppreference is found to be a reasonable source of information. Hopefully it is accurate and I interpreted the rules correctly.
As for your question regarding finding the next space, you should probably split that into a separate question. With that said you should probably use std::isspace/std::iswspace (Unless you specifically only want ' '), and then have the compiler select the appropriate function based on T:
#include <type_traits>
#include <cwctype>
#include <cctype>
template <class T>
T* findNextSpace(T* str) {
if constexpr(std::is_same_v<T, char>) {
//Use std::isspace
}
else if(std::is_same_v<T, wchar_t>) {
//Use std::iswspace
}
else {
static_assert("Not implemented");
}
}
In case your compiler doesn't support the features used here, you can implement a similar solution using template specialization:
#include <cwctype>
#include <cctype>
template <class T>
struct isspace_helper;
template <>
struct isspace_helper<char> {
bool operator()(char c) const {
return std::isspace(c);
}
};
template <>
struct isspace_helper<wchar_t> {
bool operator()(wchar_t c) const {
return std::iswspace(c);
}
};
Obviously you could also use this with std::string as well (Because most people would suggest you used std::string instead over char/wchar_t arrays unless you have a good reason not to):
#include <algorithm>
#include <string>
template <class CharT, class Traits, class Alloc>
auto findNextSpace(std::basic_string<CharT, Traits, Alloc>& str) {
return std::find(str.begin(), str.end(), isspace_helper<CharT>{});
}
If your compiler doesn't support auto you can write std::basic_string<CharT, Traits, Alloc>::iterator instead. Then maybe provide an overloading accepting the string by const-reference for good measure.

Why aren't string literals passed as references to arrays instead of opaque pointers?

In C++, the type of string literals is const char [N], where N, as std::size_t, is the number of characters plus one (the zero-byte terminator). They reside in static storage and are available from program initialization to termination.
Often, functions taking a constant string doesn't need the interface of std::basic_string or would prefer to avoid dynamic allocation; they may just need, for instance, the string itself and its length. std::basic_string, particularly, has to offer a way to be constructed from the language's native string literals. Such functions offer a variant that takes a C-style string:
void function_that_takes_a_constant_string ( const char * /*const*/ s );
// Array-to-pointer decay happens, and takes away the string's length
function_that_takes_a_constant_string( "Hello, World!" );
As explained in this answer, arrays decay to pointers, but their dimensions are taken away. In the case of string literals, this means that their length, which was known at compile-time, is lost and must be recalculated at runtime by iterating through the pointed memory until a zero-byte is found. This is not optimal.
However, string literals, and, in general, arrays, may be passed as references using template parameter deduction to keep their size:
template<std::size_t N>
void function_that_takes_a_constant_string ( const char (& s)[N] );
// Transparent, and the string's length is kept
function_that_takes_a_constant_string( "Hello, World!" );
The template function could serve as a proxy to another function, the real one, which would take a pointer to the string and its length, so that code exposure was avoided and the length was kept.
// Calling the wrapped function directly would be cumbersome.
// This wrapper is transparent and preserves the string's length.
template<std::size_t N> inline auto
function_that_takes_a_constant_string
( const char (& s)[N] )
{
// `s` decays to a pointer
// `N-1` is the length of the string
return function_that_takes_a_constant_string_private_impl( s , N-1 );
}
// Isn't everyone happy now?
function_that_takes_a_constant_string( "Hello, World!" );
Why isn't this used more broadly? In particular, why doesn't std::basic_string have a constructor with the proposed signature?
Note: I don't know how the proposed parameter is named; if you know how, please, suggest an edition to the question's title.
It's largely historical, in a sense. While you're correct that there's no real reason this can't be done (if you don't want to use your whole buffer, pass a length argument, right?) it's still true that if you have a character array it's usually a buffer not all of which you're using at any one time:
char buf[MAX_LEN];
Since this is usually how they're used, it seems needless or even risky to go to the trouble of adding a new basic_string constructor template for const CharT (&)[N].
The whole thing is pretty borderline though.
The trouble with adding such a templated overload is simple:
It would be used whenever the function is called with a static buffer of char-type, even if the buffer is not as a whole a string, and you really wanted to pass only the initial string (embedded zeroes are far less common than terminating zeroes, and using part of a buffer is very common): Current code rarely contains explicit decay from array to pointer to first element, using a cast or function-call.
Demo-code (On coliru):
#include <stdio.h>
#include <string.h>
auto f(const char* s, size_t n) {
printf("char* size_t %u\n", (unsigned)n);
(void)s;
}
auto f(const char* s) {
printf("char*\n");
return f(s, strlen(s));
}
template<size_t N> inline auto
f( const char (& s)[N] ) {
printf("char[&u]\n");
return f(s, N-1);
}
int main() {
char buffer[] = "Hello World";
f(buffer);
f(+buffer);
buffer[5] = 0;
f(buffer);
f(+buffer);
}
Keep in mind: If you talk about a string in C, it always denotes a 0-terminated string, while in C++ it can also denote a std::string, which is counted.
I believe this is being addressed in C++14 building on user defined string literals
http://en.cppreference.com/w/cpp/string/basic_string/operator%22%22s
#include <string>
int main()
{
//no need to write 'using namespace std::literals::string_literals'
using namespace std::string_literals;
std::string s2 = "abc\0\0def"; // forms the string "abc"
std::string s1 = "abc\0\0def"s; // form the string "abc\0\0def"
}
You can create helper class that will fix that without using overload for every function
struct string_view
{
const char* ptr;
size_t size;
template<size_t N>
string_view(const char (&s)[N])
{
ptr = s;
size = N;
}
string_view(const std::string& s)
{
ptr = s.data();
size = s.size() + 1; // for '\0' at end
}
};
void f(string_view);
main()
{
string_view s { "Hello world!" };
f("test");
}
You should expand this class for helper function (like begine and end) to simplify usage in your program.

Invalid assignment to unsigned char C++

I wrote the following but for some reason calling InstructionVal(b) is invalid.
intellisense is spitting out:
Only () is allowed for initializer member NPPInstructionDef::InstructionVal
here is the offending code:
//Single Instruction Definition for Instruction Dictionary
typedef struct NPPInstructionDef
{
const char* InstructionName;
const unsigned char* InstructionVal[];
NPPInstructionDef(const char* a, const unsigned char* b[]): InstructionName(a), InstructionVal()
{
}
}NPPInstruction;
Any Ideas? Thanks.
First, I'm assuming your initialization is InstructionVal(
b ), rather than the InstructionVal() which you've written.
But even then, what you've written shouldn't compile.
This is the usual problem, due to the fact that C style arrays
are broken, and shouldn't be used. Your definition:
unsigned char const* InstructionVal[];
defines an array of unknown length (thus, illegal in a class
defintion) of unsigned char*. There's no way to initialize
this in an initialization list, except by () (value
initialization).
What you want is:
std::vector <unsigned char*> InstructionVal;
, and the constructor should be:
NPPInstructionDef( std::string const& a,
std::vector <unsigned char> const& b );
, or perhaps more likely:
template <typedef Iterator>
NPPInstructionDef( std::string const& a,
Iterator begin,
Iterator end )
: InstructionName( a )
, InstructionDef( begin, end )
{
}
(This supposes, of course, that InstructionName is
std::string, instead of char const*. Which will avoid any
issues of lifetime of the string, for example, and allow easy comparison, etc.)

constant expression in c++ template argument

I have a template, that takes a char argument like:
A<'T'>
I am storing my T in a variable like:
const char ch = str[0]; //str is a string from my program
constexpr char ch = str[0]; // this doesnt work either for me
I am trying to achieve this:
A<ch>();
I am using gcc 4.7 and have dabbled with constexpr but I havent been able to get that work
Any idea of a way to get this to work?
Any help is appreciated
This can only work if everything is a constant expression:
constexpr char str[] = "Hello World";
constexpr char ch = str[0];
A<ch> x;
If the contents of str are defined at runtime, then there is no way to achieve that. The compiler requires your template value to be set during compilation.
That is why this is valid:
A<'a'>();
Since 'a' is a constant value, known during compilation. But this:
void foo(const std::string &value) {
A<value[0]> t;
}
Is not, since value[0], despite being a constant value, is not known during compilation.

Restrict passed parameter to a string literal

I have a class to wrap string literals and calculate the size at compile time.
The constructor looks like this:
template< std::size_t N >
Literal( const char (&literal)[N] );
// used like this
Literal greet( "Hello World!" );
printf( "%s, length: %d", greet.c_str(), greet.size() );
There is problem with the code however. The following code compiles and I would like to make it an error.
char broke[] = { 'a', 'b', 'c' };
Literal l( broke );
Is there a way to restrict the constructor so that it only accepts c string literals? Compile time detection is preferred, but runtime is acceptable if there is no better way.
There is a way to force a string literal argument: make a user defined literal operator. You can make the operator constexpr to get the size at compile time:
constexpr Literal operator "" _suffix(char const* str, size_t len) {
return Literal(chars, len);
}
I don't know of any compiler that implements this feature at this time.
Yes. You can generate compile time error with following preprocessor:
#define IS_STRING_LITERAL(X) "" X ""
If you try to pass anything other than a string literal, the compilation will fail. Usage:
Literal greet(IS_STRING_LITERAL("Hello World!")); // ok
Literal greet(IS_STRING_LITERAL(broke)); // error
With a C++11 compiler with full support for constexpr we can use a constexpr constructor using a constexpr function, which compiles to a non-const expression body in case the trailing zero character precondition is not fulfilled, causing the compilation to fail with an error. The following code expands the code of UncleBens and is inspired by an article of Andrzej's C++ blog:
#include <cstdlib>
class Literal
{
public:
template <std::size_t N> constexpr
Literal(const char (&str)[N])
: mStr(str),
mLength(checkForTrailingZeroAndGetLength(str[N - 1], N))
{
}
template <std::size_t N> Literal(char (&str)[N]) = delete;
private:
const char* mStr;
std::size_t mLength;
struct Not_a_CString_Exception{};
constexpr static
std::size_t checkForTrailingZeroAndGetLength(char ch, std::size_t sz)
{
return (ch) ? throw Not_a_CString_Exception() : (sz - 1);
}
};
constexpr char broke[] = { 'a', 'b', 'c' };
//constexpr Literal lit = (broke); // causes compile time error
constexpr Literal bla = "bla"; // constructed at compile time
I tested this code with gcc 4.8.2. Compilation with MS Visual C++ 2013 CTP failed, as it still does not fully support constexpr (constexpr member functions still not supported).
Probably I should mention, that my first (and preferred) approach was to simply insert
static_assert(str[N - 1] == '\0', "Not a C string.")
in the constructor body. It failed with a compilation error and it seems, that constexpr constructors must have an empty body. I don't know, if this is a C++11 restriction and if it might be relaxed by future standards.
No there is no way to do this. String literals have a particular type and all method overload resolution is done on that type, not that it's a string literal. Any method which accepts a string literal will end up accepting any value which has the same type.
If your function absolutely depends on an item being a string literal to function then you probably need to revisit the function. It's depending on data it can't guarantee.
A string literal does not have a separate type to distinguish it from a const char array.
This, however, will make it slightly harder to accidentally pass (non-const) char arrays.
#include <cstdlib>
struct Literal
{
template< std::size_t N >
Literal( const char (&literal)[N] ){}
template< std::size_t N >
Literal( char (&literal)[N] ) = delete;
};
int main()
{
Literal greet( "Hello World!" );
char a[] = "Hello world";
Literal broke(a); //fails
}
As to runtime checking, the only problem with a non-literal is that it may not be null-terminated? As you know the size of the array, you can loop over it (preferable backwards) to see if there's a \0 in it.
I once came up with a C++98 version that uses an approach similar to the one proposed by #k.st. I'll add this for the sake of completeness to address some of the critique wrt the C++98 macro.
This version tries to enforce good behavior by preventing direct construction via a private ctor and moving the only accessible factory function into a detail namespace which in turn is used by the "offical" creation macro. Not exactly pretty, but a bit more fool proof. This way, users have to at least explicitly use functionality that is obviously marked as internal if they want to misbehave. As always, there is no way to protect against intentional malignity.
class StringLiteral
{
private:
// Direct usage is forbidden. Use STRING_LITERAL() macro instead.
friend StringLiteral detail::CreateStringLiteral(const char* str);
explicit StringLiteral(const char* str) : m_string(str)
{}
public:
operator const char*() const { return m_string; }
private:
const char* m_string;
};
namespace detail {
StringLiteral CreateStringLiteral(const char* str)
{
return StringLiteral(str);
}
} // namespace detail
#define STRING_LITERAL_INTERNAL(a, b) detail::CreateStringLiteral(a##b)
/**
* \brief The only way to create a \ref StringLiteral "StringLiteral" object.
* This will not compile if used with anything that is not a string literal.
*/
#define STRING_LITERAL(str) STRING_LITERAL_INTERNAL(str, "")