Truly compile-time string hashing in C++ - c++

Basically I need a truly compile-time string hashing in C++. I don't care about technique specifics, can be templates, macros, anything. All other hashing techniques I've seen so far can only generate hashtable (like 256 CRC32 hashes) in compile time, not a real hash.
In other words, I need to have this
printf("%d", SOMEHASH("string"));
to be compiled as (in pseudo-assembler)
push HASHVALUE
push "%d"
call printf
even in Debug builds, with no runtime operations on string. I am using GCC 4.2 and Visual Studio 2008 and I need the solution to be OK for those compilers (so no C++0x).

The trouble is that in C++03 the result of subscripting a string literal (i.e. access a single character) is not a compile-time constant suitable for use as a template parameter.
It is therefore not possible to do this. I would recommend you to write a script to compute the hashes and insert them directly into the source code, i.e.
printf("%d", SOMEHASH("string"));
gets converted to
printf("%d", 257359823 /*SOMEHASH("string")*/ ));

Write your own preprocessor that scans the source for SOMEHASH("") and replaces it with the computed hash. Then pass the output of that to the compiler.
(Similar techniques are used for I18N.)

With templates only the following syntax will work:
SOMEHASH<'s','t','r','i','n','g'>
see this eg:
http://arcticinteractive.com/2009/04/18/compile-time-string-hashing-boost-mpl/
or
compile-time string hashing

You have to wait for user-defined literals in C++0x for this.

If you don't mind using the new C++0x standard in your code (some answers also include links to stuff that works in the older C++03 standard), these questions have been asked before on StackOverflow:
Compile-time (preprocessor) hashing of string
Compile time string hashing
Both of those contain answers that will help you figure out how to possibly implement this.
Here is a blog post that shows how to use Boost.MPL Compile Time String Hashing

That's not possible, it might be in C++0x but definitely not in C++03.

Related

GCC (in any version) equivalent of clang's __type_pack_element to get Nth element of template parameter pack

https://reviews.llvm.org/D15421
clang has __type_pack_element which allows efficient indexing of parameter packs in variadic templates. Is there a GCC equivalent?
I am not interested in using tuple_element_t. I am looking for an alternative that is a compiler primitive
If you are really brave/crazy you could try and use the same techniques as kvasir::mpl does for this problem. In metaben.ch benchmarks its the same or faster than others who use __type_pack_element and works on GCC (note that the benchmark looks at every element, if you just ever index a few it will look quite different but usually if you want one then you eventually want the others too).
benchmark
implementation
old blog post about it

Compiletime build up of std::regex

Since I know the regexes at compiletime, and building up a regex is in O(2^m) where m is the length of the regex, I would love to build up the regex at compiletime.
Is this possible with std::regex? (I don't think so, because I don't see any constexpr constructor for basic_regex)
And if not, is there a regex library which can buildup my regexes at compiletime?
A CppCon 2017 lightning talk by Hana Dusikova "Regular Expressions Redefined in C++” described an approach to compile-time regular expressions using a user-defined literal for regex strings and a compile-time approach to generating the matching function. The code is on GitHub, but is still experimental and highly fluid at this time. So it seems that compile-time regexes are probably going to appear sometime soon.
We need to distinguish between program compile and regex compile. The latter is really done at a program runtime and it means building a large but efficient structure (state machine) suitable for fast matching against various strings.
in c++11 regex, regex compilation is done when you construct a regex object of string:
std::regex e (your_re_string);
If you use such an object in regex_match, regex_search, regex_replace, you take the advantage of working with an already-compiled regular expression. So, if you know your string at program compile time, the best thing you can do for the sake of speed is to construct a corresponding regex object just once per program run, say, having it somewhere declared as a static variable with initializer:
static std::regex e (your_constant_re_string);
Probably it is what you want.
Some forms of regex_match, ... function may work immediately with regular expression strings instead. But please note that although it's usually more convenient for a programmer, if you use them, the performance will suffer of doing regex compiling every time such a function called.
P.S. If you really, really, really want to have you regexp compiled at a program compile time, you may
(1) Use an external regexp/lexer compiler software (like https://github.com/madelson/PrecompiledRegex.Fody, Flex https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator) or similar)
(2) compile an std::regex object, then serialize and convert to C++ input (which is actually a DIY version of (1))
But I'm quite sure that it doesn't worth if only wanted in order to save one regex compile per program run. Maybe unless you have really overwhelming expressions.

Choose default type and encoding for C++ string literals at compile time

C++11 introduced the new string literals for UTF-8, 16 and 32 with the u8, u and U prefixes but I have to hard code which one I want to use. I'm looking for a way to select which encoding I want to use at compile time (similar to how a typedef works).
User defined string literals don't seem to help as they work on the strings of the specified encoding.
I have seen in pre C++11 code the use of a short macro such as L("string") to choose between "string" and L"string" but personally I find that quite ugly.
Is it possible to neatly choose the default type and encoding or will I have to use the macro option?
Unfortunately the solution to this problem is to use the macros. Although #Nadim Farhat pointed out that you can do a certain amount of choosing with gcc it is by no means a portable solution.

C++ Compile-Time string manipulation

I looked at boost's mpl::string, but there doesn't seem to be an easy way of converting string literals to the single-quotation-integer-based format of mpl::string. What I am trying to do is to generate at compile time an XML realization of some simple data structures using compile time strings. I am striving for having macros generate the structures themselves and insert a constant "meta" field inside them, containing said XML string.
The short answer is no, there is no easy way. At least not using C++ alone, and at compile time. You can use scripts or some other code generator to produce mpl::strings with the correct literals. C++0x will bring user defined literals [1], that allow an easy manipulation of literals, character by character, for example, using variadic templates.
http://en.wikipedia.org/wiki/C%2B%2B0x#User-defined_literals
Here is an article regarding the subject: http://akrzemi1.wordpress.com/2011/05/11/parsing-strings-at-compile-time-part-i/. The author implements a simple RPN arithmetic calculator that works during compile-time using user string literals and constexpr. I won't attempt to provide any more summary of the article here.

Is there a 'catch' with FastFormat?

I just read about the FastFormat C++ i/o formatting library, and it seems too good to be true: Faster even than printf, typesafe, and with what I consider a pleasing interface:
// prints: "This formats the remaining arguments based on their order - in this case we put 1 before zero, followed by 1 again"
fastformat::fmt(std::cout, "This formats the remaining arguments based on their order - in this case we put {1} before {0}, followed by {1} again", "zero", 1);
// prints: "This writes each argument in the order, so first zero followed by 1"
fastformat::write(std::cout, "This writes each argument in the order, so first ", "zero", " followed by ", 1);
This looks almost too good to be true. Is there a catch? Have you had good, bad or indifferent experiences with it?
Is there a 'catch' with FastFormat?
Last time I checked, there was one annoying catch:
You can only use either the narrow string version or the wide string version of this library. (The functions for wchar_t and char are the same -- which type is used is a compile time switch.)
With iostreams, stdio or Boost.Format you can use both.
Found one "catch", though for most people it will never manifest. From the project page:
Atomic operation. It doesn't write out statement elements one at a time, like the IOStreams, so has no atomicity issues
The only way I can see this happening is if it buffers the whole write() call's output itself, then writes it out to the ostream in one step. This means it needs to allocate memory, and if an object passed into the write() call produces a lot of output (several megabytes or more), it can consume up to twice that much memory in internal buffers (assuming it uses the grow-a-buffer-by-doubling-its-size-each-time trick).
If you're just using it for logging, and not, say, dumping huge amounts of XML, you'll never see this problem.
The only other "catch" I'm seeing is:
Highly portable. It will work with all good modern C++ compilers; it even works with Visual C++ 6!
So it won't work with an old C++ compiler, like cfront, whereas iostreams is backward compatible to the late 80's. Again, I'd be surprised if anyone ever had a problem with this.
Although FastFormat is a good library there are a number of issues with it:
Limited formatting support, in particular the following features are not supported:
Leading zeros (or any other non-space padding)
Octal/hexadecimal encoding
Runtime width/alignment specification
The library is quite big for a relatively small task of formatting and has even bigger dependency (STLSoft).
It looks pretty interesting indeed! Good tip regardless, and +1 for that!
I've been playing with it for a bit. The main drawback I see is that FastFormat supports less formatting options for the output. This is I think a direct consequence of the way the higher typesafety is achieved, and a good tradeoff depending on your circumstances.
If you look in detail at his performance benchmark page, you'll notice that good old C printf-family functions are still winning on Linux. In fact, the only test case where they perform poorly is the test case that should be static string concatenations, where I would expect printf to be wasteful. Moreover, GCC provides static type-checking on printf-style function calls, so the benefit of type-safety is reduced. So: if you are running on Linux and if you need the absolute best performance, FastFormat is probably not the optimal solution.
The library depends on a couple of environment variables, as mentioned in the docs.
That might be no biggie to some people, but I'd prefer my code to be as self-contained as possible. If I check it out from source control, it should work and compile. It won't, if it requires you to set environment variables.