Compile time encryption for strings using user-defined literals - c++

I am aware that the new C++ standard allows for user-defined literals and that their generation can be done in compile time.
However, I am quite new to the whole template metaprogramming universe and I'm trying to get some examples going, but still without success, since it's the first time I have contact with this particular feature.
So let's say I have a string:
std::string tmp = "This is a test message";
And I would like to encrypt it at compile time using:
std::string tmp = "This is a test message"_encrypt;
Is it even possible what I'm trying to attempt?
I am currently using VS2015 so any help or feedback is appreciated.

Is it even possible what I'm trying to attempt?
Yes, it is possible*. What you can pre-compute and put directly in the source code can also be done by the compiler at compile time.
However, you cannot use std::string. It's not a literal type. Something like:
constexpr std::string tmp = "some string literal"
will never compile because std::string and std::basic_string in general have no constexpr constructor.
You must therefore use const char [] as input for your meta-programming; after that, you may assign it to a std::string.
NB: Meta-programming has some restrictions you need to take into account: you don't have access to many tools you'd otherwise have, like new or malloc, for example: you must allocate on the stack your variables.
*Edit: Not entirely with UDLs, as #m.s. points out. Indeed, you receive a pointer to const chars and the length of the string. This is pretty restrictive in a constexpr scenario, and I doubt it's possible to find a way to work on that string. In "normal" meta-programming, where you can have a size that is a constant expression, compile-time encryption is instead possible.

Related

#define or constexpr, which is more suitable here to maximal efficiency?

I have a constant string value
std::string name_to_use = "";
I need to use this value in just one place, calling the below function on it
std::wstring foo (std::string &x) {...};
// ...
std::wstring result = foo (name_to_use);
I can simply not declare the variable and use a string literal in the function call instead, but to allow easy configuration of name_to_use I decided to declare at the beginning of the file.
Now, since I am not really modifying name_to_use I thought why not use a #define preprocessing directive so I do not have to store name_to_use as a const anywhere in memory while the main program runs continuously (a GUI is displayed).
It worked fine, but then I came across constexpr. A user on stackoverflow has said to use it instead of #define as it is a safer option.
However, constexpr std::string name_to_use is still going to leak memory in this case right? Since it's not actually replacing occurrences of name_to_use with a value but holding a reference to it computed at compile time (which does not offer me any benefit here anyway, if I'm not mistaken?).
If you #define it to "", then at each call there'll be a conversion from c-string to std::string, which is pretty inefficient. However, you can (usually) pass macro defines as arguments to compiler, which helps customization. Even in that case, it makes sense to write the static constexpr std::string name_to_use.
With static constexpr std::string name_to_use = ...;, the problem of conversion goes away (likely done compile-time). Don't expect the compiler not to do optimizations - if it's a compile-time string, it might happen that the entire function is optimized away (but still, the object will exists and the code will adhere to the as-if rule).
To combine the two, you can do:
#ifdef NAME_TO_USE
constexpr const std::string = # NAME_TO_USE;
#else
constexpr const std::string = "";
#endif
Also, as others said, please consider std::string_view to avoid allocation.
The user is saying well, and you did understand right.
Using the constexpr method will allocate a constant when the macro will just replace itself at compile time. The only benefit of the first is that it is typed, and can make your code a little bit safer when you compile it.
This being said, the choice is yours. Do you want to have a non-typed macro that doesn't add any operation on run-time, or a typed constant that use a little bit of memory at parsing ?

Doesn't gsl::cstring_span support constexpr? If not, why should I use it?

I have a piece of code that looks like this:
constexpr gsl::cstring_span<> const somestring{"Hello, I am a string"};
and it refuses to compile with a message complaining that some non-constexpr function is being called somewhere.
Why is this? This seems like the most important use-case to support. The whole point is to have compile-time bounds checking if at all possible. Compile time bounds checking involving constant string literals seems like the thing it would be used for the most often. But this can't happen if it can't be declared constexpr. What's going on here?
I think the problem is that string literals have type array of const char and are null-terminated. But who is to say you are constructing your cstring_span from a null-terminated array?
Because of that the constructor of cstring_span does a physical check to remove the null terminator if it exists, otherwise accept the full length of the array.
I am not sure how powerful constexpr expressions can be but it may be possibly to implement it in a constexpr way. You could create an issue asking about it here:
https://github.com/Microsoft/GSL/issues

Compile Time Error vs Run Time Error

I am confused why compiler gives
const char s[]="hello";
s[2]='t'; // Compile Time Error
char *t = "hello";
*(t+2)='u'; // Run time Error
I guess in both case the compiler should give compile time error. Can anyone tell me particular reason for this to be this way?
In the first case, you are writing to a const and the compiler notices that and can reject that.
In the second case, t is a pointer to a non-const char, so you can dereference it and write at *(t+2). However, since t is initialized with a pointer to a read-only segment, you are getting a segmentation violation at runtime.
You could painfully configure your linker to put all data in writable segments. This is ugly and non standard.
P.S. Some sophisticated static analyzers (maybe Frama-C) might catch both errors without running the program. One could also imagine extending GCC e.g. with MELT to add such checks (but this is non-trivial work, and it might be hard to get funded for it).
Backwards compatibility.
You can't modify a const char. That much is obvious.
What isn't obvious is that the type of a string literal is actually a pointer to constant characters, not a pointer to characters. The second declaration, actually, therefore has a wrong type. This is supported, however, for historical reasons.
Note that the above is a bit of a lie. Rather than pointers, string literals are actually char[] types.
In particular, the type of a string literal is a char[] rather than a const char[] in C89 and C99 [and I think C11, not sure though]. It's not actually wrong then, but the data is stored in a read only segment, so it's undefined behavior to try to write to it.
Also, for what it's worth, you can use -Wwrite-strings with gcc (g++ already includes it) to be warned of this.
More information here and here.
When you do: char *t = "hello"; then t is a pointer that points to a memory that is in the code part, so you can't change it. Because it's read-only, you're getting segmentation fault at runtime.
When you do: const char s[]="hello"; then s is an array of chars that are on the stack, but it's const, so you can't change it and you get a compilation error (The compiler knows it is const so he doesn't allow you to change it).
Using const when you don't want your String to be changed is good because this arise a compilation error, which is a way better than a run time error.
Consider the following series of statements:
char *t = "hello";
char s[5];
t = s;
*(t+2)='u';
This series of statements will give no run time error, because the statement *(t+2)='u'; is not invalid. It is trying to modify a const (read-only) memory location in your case, but the compiler has no way of knowing whether an access violation will occur.

Why are string literals const?

It is known that in C++ string literals are immutable and the result of modifying a string literal is undefined. For example
char * str = "Hello!";
str[1] = 'a';
This will bring to an undefined behavior.
Besides that string literals are placed in static memory. So they exists during whole program. I would like to know why do string literals have such properties.
There are a couple of different reasons.
One is to allow storing string literals in read-only memory (as others have already mentioned).
Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:
char *foo = "long string";
char *bar = "string";
In a case like this, it's possible for bar to be foo+5 (if I'd counted correctly).
In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.
By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.
It's undefined behaviour to modify a literal because the standard says so. And the standard says so to allow compilers to put literals in read only memory. And it does this for a number of reasons. One of which is to allow compilers to make the optimisation of storing only one instance of a literal that is repeated many times in the source.
I believe you ask about the reason why literals are placed in
read-only memory, not about technical details of linker doing this and
that or legal details of a standard forbidding such and such.
When modification of string literals works, it leads to subtle bugs
even in the absence of string merging (which we have reasons to
disallow if we decided to permit modification). When you see code like
char *str="Hello";
.../* some code, but str and str[...] are not modified */
printf("%s world\n", str);
it's a natural conclusion that you know what's going to be printed,
because str (and its content) were not modified in a particular
place, between initialization and use.
However, if string literals are writable, you don't know it any
more: str[0] could be overwritten later, in this code or inside a
deeply nested function call, and when the code is run again,
char *str="Hello";
won't guarantee anything about the content of str anymore. As we
expect, this initialization is implemented as moving the address known
in link time into a place for str. It does not check that str
contains "Hello" and it does not allocate a new copy of it. However,
we understand this code as resetting str to "Hello". It's hard to
overcome this natural understanding, and it's hard to reason about the
code where it is not guaranteed. When you see an expression like
x+14, what if you had to think about 14 being possibly overwritten
in other code, so it became 42? The same problem with strings.
That's the reason to disallow modification of string literals, both in
the standard (with no requirement to detect the failure early) and in
actual target platforms (providing the bonus of detecting potential
bugs).
I believe that many attempts to explain this thing suffer from the
worst kind of circular reasoning. The standard forbids writing to
literals because the compiler can merge strings, or they can be placed
in read-only memory. They are placed in read-only memory to catch the
violation of the standard. And it's valid to merge literals because
the standard forbids... is it a kind of explanation you asked for?
Let's look at other
languages. Common Lisp standard
makes modification of literals undefined behaviour, even though the
history of preceding Lisps is very different with the history of C
implementations. That's because writable literals are logically
dangerous. Language standards and memory layouts only reflect that
fact.
Python language has exactly one place where something resembling
"writing to literals" can happen: parameter default values, and this
fact confuses people all the time.
Your question is tagged C++, and I'm unsure of its current state
with respect to implicit conversion to non-const char*: if it's a
conversion, is it deprecated? I expect other answers to provide a
complete enlightenment on this point. As we talk of other languages
here, let me mention plain C. Here, string literals are not const,
and an equivalent question to ask would be why can't I modify string
literals (and people with more experience ask instead, why are
string literals non-const if I can't modify them?). However, the
reasoning above is fully applicable to C, despite this difference.
Because is K&R C, there was not such thing as "const". And similarly in pre-ANSI C++. Hence there was a lot of code which had things like char * str = "Hello!"; If the Standards committee made text literals const, all those programs would have no longer compiled. So they made a compromise. Text literals are official const char[], but they have a silent implicit conversion to char*.
In C++, string literals are const because you aren't allowed
to modify them. In standard C, they would have been const as
well, except that when const was introduced into C, there was
so much code along the lines of char* p = "somethin"; that
making them const would have broke, that it was deemed
unacceptable. (The C++ committee chose a different solution to
this problem, with a deprecated implicit conversion which allows
the above.)
In the original C, string literals were not const, and were
mutable, and it was garanteed that no two string literals shared
any memory. This was quickly realized to be a serious error,
allowing things like:
void
mutate(char* p)
{
static char c = 'a';
*p = a ++;
}
and in another module:
mutate( "hello" ); // Can't trust what is written, can you.
(Some early implementations of Fortran had a similar issue,
where F(4) might call F with just about any integral value.
The Fortran committee fixed this, just like the C committee
fixed string literals in C.)

Why use c strings in c++?

Is there any good reason to use C-strings in C++ nowadays? My textbook uses them in examples at some points, and I really feel like it would be easier just to use a std::string.
The only reasons I've had to use them is when interfacing with 3rd party libraries that use C style strings. There might also be esoteric situations where you would use C style strings for performance reasons, but more often than not, using methods on C++ strings is probably faster due to inlining and specialization, etc.
You can use the c_str() method in many cases when working with those sort of APIs, but you should be aware that the char * returned is const, and you should not modify the string via that pointer. In those sort of situations, you can still use a vector<char> instead, and at least get the benefit of easier memory management.
A couple more memory control notes:
C strings are POD types, so they can be allocated in your application's read-only data segment. If you declare and define std::string constants at namespace scope, the compiler will generate additional code that runs before main() that calls the std::string constructor for each constant. If your application has many constant strings (e.g. if you have generated C++ code that uses constant strings), C strings may be preferable in this situation.
Some implementations of std::string support a feature called SSO ("short string optimization" or "small string optimization") where the std::string class contains storage for strings up to a certain length. This increases the size of std::string but often significantly reduces the frequency of free-store allocations/deallocations, improving performance. If your implementation of std::string does not support SSO, then constructing an empty std::string on the stack will still perform a free-store allocation. If that is the case, using temporary stack-allocated C strings may be helpful for performance-critical code that uses strings. Of course, you have to be careful not to shoot yourself in the foot when you do this.
Because that's how they come from numerous API/libraries?
Let's say you have some string constants in your code, which is a pretty common need. It's better to define these as C strings than as C++ objects -- more lightweight, portable, etc. Now, if you're going to be passing these strings to various functions, it's nice if these functions accept a C string instead of requiring a C++ string object.
Of course, if the strings are mutable, then it's much more convenient to use C++ string objects.
If a function needs a constant string I still prefer to use 'const char*' (or const wchar_t*) even if the program uses std::string, CString, EString or whatever elsewhere.
There are just too many sources of strings in a large code base to be sure the caller will have the string as a std::string and 'const char*' is the lowest common denominator.
Textbooks feature old-school C strings because many basic functions still expect them as arguments, or return them. Additionally, it gives some insight into the underlying structure of the string in memory.
Memory control. I recently had to handle strings (actually blobs from a database) about 200-300 MB in size, in a massively multithreaded application. It was a situation where just-one-more copy of the string might have burst the 32bit address space. I had to know exactly how many copies of the string existed. Although I'm an STL evangelist, I used char * then because it gave me the guarantee that no extra memory or even extra copy was allocated. I knew exactly how much space it would need.
Apart from that, standard STL string processing misses out on some great C functions for string processing/parsing. Thankfully, std::string has the c_str() method for const access to the internal buffer. To use printf() you still have to use char * though (what a crazy idea of the C++ team to not include (s)printf-like functionality, one of the most useful functions EVER in C. I hope boost::format will soon be included in the STL.
If the C++ code is "deep" (close to the kernel, heavily dependent on C libraries, etc.) you may want to use C strings explicitly to avoid lots of conversions in to and out of std::string. Of, if you're interfacing with other language domains (Python, Ruby, etc.) you might do so for the same reason. Otherwise, use std::string.
Some posts mention memory concerns. That might be a good reason to shun std::string, but char* probably is not the best replacement. It's still an OO language. Your own string class is probably better than a char*. It may even be more efficient - you can apply the Small String Optimization, for instance.
In my case, I was trying to get about 1GB worth of strings out of a 2GB file, stuff them in records with about 60 fields and then sort them 7 times of different fields. My predecessors code took 25 hours with char*, my code ran in 1 hour.
1) "string constant" is a C string (const char *), converting it to const std::string& is run-time process, not necessarily simple or optimized.
2) fstream library uses c-style strings to pass file names.
My rule of thumb is to pass const std::string& if I am about to use the data as std::string anyway (say, when I store them in a vector), and const char * in other cases.
After spending far, far, too much time debugging initialization rules and every conceivable string implementation on several platforms we require static strings to be const char*.
After spending far, far, too much time debugging bad char* code and memory leaks I suggest that all non-static strings be some type of string object ... until profiling shows that you can and should do something better ;-)
Legacy code that doesn't know of std::string. Also, before C++11 opening files with std::ifstream or std::ofstream was only possible with const char* as an input to the file name.
Given the choice, there is generally no reason to choose primitive C strings (char*) over C++ strings (std::string). However, often you don't have the luxury of choice. For instance, std::fstream's constructors take C strings, for historical reasons. Also, C libraries (you guessed it!) use C strings.
In your own C++ code it is best to use std::string and extract the object's C string as needed by using the c_str() function of std::string.
It depends on the libraries you're using. For example, when working with the MFC, it's often easier to use CString when working with various parts of the Windows API. It also seems to perform better than std::string in Win32 applications.
However, std::string is part of the C++ standard, so if you want better portability, go with std::string.
For applications such as most embedded platforms where you do not have the luxury of a heap to store the strings being manipulated, and where deterministic preallocation of string buffers is required.
c strings don't carry the overhead of being a class.
c strings generally can result in faster code, as they are closer to the machine level
This is not to say, you can't write bad code with them. They can be misused, like every other construct.
There is a wealth of libary calls that demand them for historical reasons.
Learn to use c strings, and stl strings, and use each when it makes sense to do so.
STL strings are certainly far easier to use, and I don't see any reason to not use them.
If you need to interact with a library that only takes C-style strings as arguments, you can always call the c_str() method of the string class.
The usual reason to do it is that you enjoy writing buffer overflows in your string handling. Counted strings are so superior to terminated strings it's hard to see why the C designers ever used terminated strings. It was a bad decision then; it's a bad decision now.