compile-time string hashing

compile-time string hashing - c++

I need to use a string as the ID to obtain some object. At implement this in a run-time, and works well. But this makes the static type checking impossible, for obvious reasons.
I've Googled for the algorithm for calculating the hash-sum of string in the compile-time: C++ compile-time string hashing with Boost.MPL.
It seems to be the perfect solution to my problem, except that the sring which is necessary to the algorithm should be split into pieces by 4 characters, or character-by-character, as well, for obvious reasons.
i.e., instead of the usual current record of the ID's, I'll have to write this way:
hash_cstring<boost::mpl::string<'obje', 'ct.m', 'etho', 'd'>>::value
This is absolutely unusable.
The question is, how to pass correctly the string such as "object.method" to this algorithm?
Thank you all.

Solution with gcc-4.6:
#include <iostream>
template<size_t N, size_t I=0>
struct hash_calc {
static constexpr size_t apply (const char (&s)[N]) {
return (hash_calc<N, I+1>::apply(s) ^ s[I]) * 16777619u;
};
};
template<size_t N>
struct hash_calc<N,N> {
static constexpr size_t apply (const char (&s)[N]) {
return 2166136261u;
};
};
template<size_t N>
constexpr size_t hash ( const char (&s)[N] ) {
return hash_calc<N>::apply(s);
}
int main() {
char a[] = "12345678";
std::cout << std::hex << hash(a) << std::endl;
std::cout << std::hex << hash("12345678") << std::endl;
}
http://liveworkspace.org/code/DPObf
I`m happy!

I don't know of a way to do this with the preprocessor or with templates. I suspect your best bet is to create a separate pre-compile step (say with perl or such) to generate the hash_cstring statements from a set of source statements. Then at least you don't have to split the strings manually when you add new ones, and the generation is fully automated and repeatable.

Templates can be instantiated with any external symbol, therefore this should work as expected:
external char const* object_method = "object.method";
... = hash_cstring<object_method>::value;
(given the template hash_cstring<> is able to deal with pointer values).

In case anyone is interested, I walk through how to create a compile time hash of Murmur3_32 using C++11 constexpr functions and variadic templates here:
http://roartindon.blogspot.sg/2014/10/compile-time-murmur-hash-in-c.html
Most of the examples I've seen deal with hashes that are based on consuming one character of the string at a time. The Murmur3_32 hash is a bit more interesting in that it consumes 4 characters at a time and needs some special case code to handle the remaining 0, 1, 2 or 3 bytes.

Related

C++ What is wrong about using this approach instead of enums when I want a string representation?

There are several questions around concerning this topic (e.g. here and here). I am a bit surprised how lenghty the proposed solutions are. Also, I am a bit lazy and would like to avoid maintaining an extra list of strings for my enums.
I came up with the following and I wonder if there is anything fundamentally wrong with my approach...
class WEEKDAY : public std::string{
public:
static const WEEKDAY MONDAY() {return WEEKDAY("MONDAY");}
static const WEEKDAY TUESDAY(){return WEEKDAY("TUESDAY");}
/* ... and so on ... */
private:
WEEKDAY(std::string s):std::string(s){};
};
Still I have to type the name/string representation more than once, but at least now its all in a single line for each possible value and also in total it does not take much more lines than a plain enum. Using these WEEKDAYS looks almost identical to using enums:
bool isAWorkingDay(WEEKDAY w){
if (w == WEEKDAY::MONDAY()){return true;}
/* ... */
return false;
}
and its straighforward to get the "string representation" (well, in fact it is just a string)
std::cout << WEEKDAY::MONDAY() << std::end;
I am still relatively new to C++ (not in writing but in understanding ;), so maybe there are things that can be done with enums that cannot be done with such kind of constants.

You could use the preprocessor to avoid duplicating the names:
#define WEEKDAY_FACTORY(DAY) \
static const WEEKDAY DAY() {return WEEKDAY(#DAY);}
WEEKDAY_FACTORY(MONDAY)
WEEKDAY_FACTORY(TUESDAY)
// and so on
Whether the deduplication is worth the obfuscation is a matter of taste. It would be more efficient to use an enumeration rather than a class containing a string in most places; I'd probably do that, and only convert to a string when needed. You could use the preprocessor to help with that in a similar way:
char const * to_string(WEEKDAY w) {
switch (w) {
#define CASE(DAY) case DAY: return #DAY;
CASE(MONDAY)
CASE(TUESDAY)
// and so on
}
return "UNKNOWN";
}

C# String.Format with Parameters standard equivalent in C++?

I have a lot of C# Code that I have to write in C++. I don't have much experience in C++.
I am using Visual Studio 2012 to build. The project is an Static Library in C++ (not in C++/CLI).
In many places they were using String.Format, like this:
C#
String.Format("Some Text {0}, some other Text {1}", parameter0, parameter1);
Now, I know similar things have been asked before, but It is not clear to me what is the most standard/safe way to do this.
Would it be safe to use something like sprintf or printf? I read some people mentioning like they are not standard. Something like this? (would this be the C++ way, or is more the C way?)
C++ (or is it C?)
char buffer [50];
int n, a=5, b=3;
n=sprintf (buffer, "Some Text %d, some other Text %d", a, b);
Other people suggested to do your own class, and I saw many different implementations.
For the time being, I have a class that uses std::to_string, ostringstream, std::string.replace and std::string.find, with Templates. My class is rather limited, but for the cases I have in the C# code, it works. Now I don't know this is the most efficient way (or even correct at all):
C++
template <typename T>
static std::string ToString(T Number)
{
std::ostringstream stringStream;
stringStream << Number;
std::string string = stringStream.str();
return string;
};
template <typename T,unsigned S>
static std::string Format(const std::string& stringValue, const T (&parameters)[S])
{
std::string stringToReturn = std::string(stringValue);
for (int i = 0; i < S; ++i)
{
std::string toReplace = "{"+ std::to_string(i) +"}";
size_t f = stringToReturn.find(toReplace);
if(std::string::npos != f)
stringToReturn.replace(f, toReplace.length(), ToString(parameters[i]));
}
return stringToReturn;
};
//I have some other overloads that call the Format function that receives an array.
template <typename T>
static std::string Format(const std::string& stringValue, const T parameter, const T parameter2)
{
T parameters[] = {parameter, parameter2};
return Format(stringValue, parameters);
};
And I need my code to work both in Linux and Windows, so I need different compilers to be able to build it, that is why I need to be sure I am using a standard way. And my environment can not be updated so easily, so I can not use C++11. I can not use Boost either, because I can not be sure I will be able to add the libraries in the different environments I need it to work.
What is the best approach I can take in this case?

Here's a 1-header library I've been writing just for that purpose: fakeformat
Test:
REQUIRE(ff::format("{2}ff{1}").with('a').also_with(7).now()=="7ffa");
The library is configurable, so that you can start parameter indexing from 0. You can also write a wrapper, so that it would look exactly like String.Format.
It builds on linux and doesn't need c++11.
There's no standard way yet...
Or, you could use Boost.Locale formatting
Here it is, with indices starting from 0:
#include ...
struct dotnet_config {
static const char scope_begin='{';
static const char scope_end='}';
static const char separator=',';
static const char equals='=';
static const size_t index_begin=0;
static bool string_to_key(std::string const& to_parse,int& res) {
std::istringstream ss(to_parse);
ss.imbue(std::locale::classic());
ss >> res;
if (!ss.fail() && ss.eof())
return true;
return false;
}
};
template <typename T1>
std::string Format (std::string const& format_string,T1 p1) {
return ff::formatter<dotnet_config>(format_string).with(p1).now();
}
template <typename T1,typename T2>
std::string Format (std::string const& format_string,T1 p1,T2 p2) {
return ff::formatter<dotnet_config>(format_string).with(p1).with(p2).now();
}
int main() {
std::cout<<Format("test={0}",42)<<std::endl;
std::cout<<Format("{0}!={1}",33,42)<<std::endl;
return 0;
}
Output:
test=42
33!=42

sprintf works if all you have are non-object types (or you manually convert them to C-strings, or convert them to strings and then call the c_str() member function). You may want the extra protection against buffer overflow that snprintf provides.
If you're willing to learn more to do what you have to, you can use the Boost Format library. I'm sure you can write a script to convert String.format calls to Boost's syntax.
If you can't use Boost, and you can't use C++11, you have to go with sprintf and be careful about buffer overflow (possibly snprintf if you can rely on your compiler having it). You might want to write a script to wrap all the parameters so that they all convert to strings:
String.Format("Some Text {0}, some other Text {1}", to_printf(p0), to_printf(p1));
Also, note that C's format doesn't use braces. So that's a big problem. You may need to implement your own variadic function.
If everything is simple like {0}, you can probably write a script to replace most instances of String.Format (and none of the more complicated ones) with something like
`mystring = "Some Text "+tostring(p0)+", some other Text "+tostring(p1);`
which wouldn't be the most efficient way, but most likely won't matter unless you're doing thousands of formats per second. Or possibly slightly more efficient (no intermediate strings):
`"mystring = static_cast<std::ostringstream&>(std::ostringstream().flush()<<Some Text "<<p0<<", some other Text "<<p1).str();`,
which creates a temporary. The flush sort of tricks the compiler into thinking it's not a temporary, and that solves a specific problem about not being able to use non-member operator<<.

Why don't you use the << operator to format your string?
string strOutput;
stringstream strn;
int i = 10;
float f = 20.0f;
strn << "Sally scored "<<i<< " out of "<<f << ". She failed the test!";
strn >> strOutput;
cout << strOutput;

Using templates for implementing a generic string parser

I am trying to come up with a generic solution for parsing strings (with a given format). For instance, I would like to be able to parse a string containing a list of numeric values (integers or floats) and return a std::vector. This is what I have so far:
template<typename T, typename U>
T parse_value(const U& u) {
throw std::runtime_error("no parser available");
}
template<typename T>
std::vector<T> parse_value(const std::string& s) {
std::vector<std::string> parts;
boost::split(parts, s, boost::is_any_of(","));
std::vector<T> res;
std::transform(parts.begin(), parts.end(), std::back_inserter(res),
[](const std::string& s) { return boost::lexical_cast<T>(s); });
return res;
}
Additionally, I would like to be able to parse strings containing other type of values. For instance:
struct Foo { /* ... */ };
template<>
Foo parse_value(const std::string& s) {
/* parse string and return a Foo object */
}
The reason to maintain a single "hierarchy" of parse_value functions is because, sometimes, I want to parse an optional value (which may exist or not), using boost::optional. Ideally, I would like to have just a single parse_optional_value function that would delegate on the corresponding parse_value function:
template<typename T>
boost::optional<T> parse_optional_value(const boost::optional<std::string>& s) {
if (!s) return boost::optional<T>();
return boost::optional<T>(parse_value<T>(*s));
}
So far, my current solution does not work (the compiler cannot deduce the exact function to use). I guess the problem is that my solution relies on deducing the template value based on the return type of parse_value functions. I am not really sure how to fix this (or even whether it is possible to fix it, since the design approach could just be totally flawed). Does anyone know a way to solve what I am trying to do? I would really appreciate if you could just point me to a possible way to address the issues that I am having with my current implementation. BTW, I am definitely open to completely different ideas for solving this problem too.

You cannot overload functions based on return value [1]. This is precisely why the standard IO library uses the construct:
std::cin >> a >> b;
which may not be your piece of cake -- many people don't like it, and it is truly not without its problems -- but it does a nice job of providing a target type to the parser. It also has the advantage over a static parse<X>(const std::string&) prototype that it allows for chaining and streaming, as above. Sometimes that's not needed, but in many parsing contexts it is essential, and the use of operator>> is actually a pretty cool syntax. [2]
The standard library doesn't do what would be far and away the coolest thing, which is to skip string constants scanf style and allow interleaved reading.
vector<int> integers;
std::cin >> "[" >> interleave(integers, ",") >> "]";
However, that could be defined. (Possibly it would be better to use an explicit wrapper around the string literals, but actually I prefer it like that; but if you were passing a variable you'd want to use a wrapper).
[1] With the new auto declaration, the reason for this becomes even clearer.
[2] IO manipulators, on the other hand, are a cruel joke. And error handling is pathetic. But you can't have everything.

Here is an example of libsass parser:
const char* interpolant(const char* src) {
return recursive_scopes< exactly<hash_lbrace>, exactly<rbrace> >(src);
}
// Match a single character literal.
// Regex equivalent: /(?:x)/
template <char chr>
const char* exactly(const char* src) {
return *src == chr ? src + 1 : 0;
}
where rules could be passed into the lex method.

Reviewing C++: Int to Char

It's been a while since I have worked with C++, I'm currently catching up for an upcoming programming test. I have the following function that has this signature:
void MyIntToChar(int *arrayOfInt,char* output)
Int is an array of integers and char* output is a buffer that should be long enough to hold the string representation of the integers that the function receives.
Here is an example of the usage of such function:
int numbers[3] = {11, 26, 81};
char* output = ""; // this I'm sure is not valid, any suggestions on how to
// to properly initialize this string?
MyIntToChar(numbers,output);
cout << output << endl; // this should print "11 26 81" or "11, 26, 81".
// i.e. formatting should not be a problem.
I have been reviewing my old c++ notes from college, but I keep having problems with these. I'm hating myself right now for going to the Java world and not working in this.
Thanks.

void MyIntToChar(int *arrayOfInt, char* output);
That's wrong in several ways. First, of all, it's a misnomer. You cannot, in general, convert an integer into one character, because only ten of all intergers (0...9) would fit into one. So I will assume you want to convert integers into _strings instead.
Then, if you pass arrays to functions, they decay to pointers to their first element, and all information about the array's size is lost. So when you pass arrays to function, you need to pass size information, too.
Either use the C way of doing this and pass in the number of elements as std::size_t (to be obtained as sizeof(myarray)/sizeof(myarray[0])):
void MyIntToStr(int *arrayOfInt, std::size_t arraySize, char* output);
Or do it the C++ way and pass in two iterators, one pointing at the first element (so-called begin iterator) and the other pointing to one behind the last (end iterator):
void MyIntToStr(int *begin, int *end, char* output);
You can improve on that by not insisting on the iterators being int*, but anything which, when dereferenced, yields an int:
template< typename FwdIt >
void MyIntToStr(FwdIt begin, FwdIt end, char* output);
(Templates would require you to implement the algorithm in an header.)
Then there's the problems with the output. First of all, do you really expect all the numbers to be written into one string? If so, how should they be separated? Nothing? Whitespace? Comma?
Or do you expect an array of strings to be returned?
Assuming you really want one string, if I pass the array {1, 2, 3, 4, 5} into your function, it needs space for five single-digit integers plus the space needed for four separators. Your function signature suggests you want me to allocate that upfront, but frankly, if I have to calculate this myself, I might just as well do the conversions myself. Further, I have no way of telling you how much memory that char* points to, so you can't check whether I was right. As generations of developers have found out, this is so hard to get right every time, that several computer languages have been invented to make things easier for programmers. One of those is C++, which nowadays comes with a dynamically resizing string class.
It would be much easier (for you and for me), if I could pass you a stirng and you write into that:
template< typename FwdIt >
void MyIntToChar(FwdIt begin, FwdIt end, std::string& output);
Note that I this passes the string per non-const reference. This allows you to modify my string and let's me see the changes you made.
However, once we're doing this, you might just as well return a new string instead of requireing me to pass one to you:
template< typename FwdIt >
std::string MyIntToChar(FwdIt begin, FwdIt end);
If, however, you actually wanted an array of strings returned, you shouldn't take one string to write to, but a means where to write them to. The naive way of doing this would be to pass a dynamically re-sizable array of dynamically re-sizable string. In C++, this is spelled std::vector<std::string>:
template< typename FwdIt >
void MyIntToStr(FwdIt begin, FwdIt end, std::vector<std::string>& output);
Again, it might be better you return such an array (although some would disagree since copying an array of string might be considered to expensive). However, the best way to do this would not require me to accept the result in form of a 'std::vector'. What if I needed the strings in a (linked) list instead? Or written to some stream?
The best way to do this would be for your function to accept an output iterator to which you write your result:
template< typename FwdIt, typename OutIt >
void MyIntToStr(FwdIt begin, FwdIt end, OutIt output);
Of course, now that's so general that it's hard to see what it does, so it's good we gave it a good name. However, looking at it I immediately think that this should build on another function which is needed probably even more than this one: A function that takes one integer and converts it to one string. Assuming that we have such a function:
std::string MyIntToStr(int i);
it's very easy to implement the array versions:
template< typename FwdIt, typename OutIt >
void MyIntToStr(FwdIt begin, FwdIt end, OutIt output)
{
while(begin != end)
*output++ = MyIntToStr(*begin++);
}
Now all that remains for you to be done is to implement that std::string MyIntToStr(int i); function. As someone else already wrote, that's easily done using string streams and you shouldn't have a problem to find some good examples for that. However, it's even easier to find bad examples, so I'd rather give you one here:
std::string MyIntToStr(int i);
{
std::ostringstream oss;
oss << i:
if(!oss) throw "bah!"; // put your error reporting mechanism here
return oss.str();
}
Of course, given templates, that easy to generalize to accepting anything that's streamable:
template< typename T >
std::string MyIntToStr(const T& obj);
{
std::ostringstream oss;
oss << obj:
if(!oss) throw "bah!"; // put your error reporting mechanism here
return oss.str();
}
Note that, given this general function template, the MyIntToStr() working on arrays now automatically works on arrays of any type the function template working on one object works on.
So, at the end of this (rather epic, I apologies) journey, this is what we arrived at: a generalized function template to convert anything (which can be written to a stream) into a string, and a generalized function template to convert the contents of any array of objects (which can be written to a stream) into a stream.
Note that, if you had at least a dozen 90mins lectures on C++ and your instructors failed to teach you enough to at least understand what I've written here, you have not been taught well according to modern C++ teaching standards.

Well an integer converted to string will require a max of 12 bytes including sign (assuming 32bit), so you can allocate something like this
char * output= new char[12*sizeof(numbers)/sizeof(int)];

First of all, it is impossible to use your method that way your example says:
char* output has only a size of 1 byte (don't forget the null-terminator '\0'). So you can't put a whole string in it. You will get segmentation faults. So, here you are going to make use of the heap. This is already implemented in std::string and std::stringstream. So use them for this problem.
Let's have a look:
#include <string>
#include <sstream>
#include <iostream>
std::string intArrayToString(int *integers, int numberOfInts)
{
std::stringstream ss;
for (int i = 0; i < numberOfInts; i++)
{
ss << integers[i] << ", ";
}
std::string temp = ss.str();
return temp.substr(0, temp.size() - 2); // Cut of the extra ", "
}
And if you want to convert it to char*, you can use yourString.c_str();

Here's a possibility if you are willing to reconsider a change in prototype of the function
template<int n>
void MyIntToChar(int (&iarr)[n], string &output){
stringstream ss;
for(size_t id = 0; id < n; ++id){
ss << iarr[id];
if(id != n - 1) ss << " ";
}
output = ss.str();
}
int main(){
int numbers[3] = {11, 26, 81};
string out = "";
MyIntToChar(numbers, out);
}

You should take a look at the std::stringstream, or, more C-ish (as char* type instead of strings might suggest) sprintf.

did you try sprintf(),it will do your work.For char * initialization,you have to either initialize it by calling malloc or you can take is as a char array and pass the address to the function rather then value.

Sounds like you would like to use C lang. Here's an example. There's an extra ", " at the end of the output but it should give you a feel for the concept. Also, I changed the return type so that I would know how many bytes of output were used. The alternative would be to initialize output would nulls.
int MyIntToChar(int *arrayOfInt, char* output) {
int bytes_used = 0; // use to bump the address past what has been used
for (int i = 0 ; i < sizeof(arrayOfInt); ++i)
bytes_used += sprintf(output + bytes_used, "%u, ", arrayOfInt[i]);
return bytes_used;
}
int main() {
int numbers[5] = {5, 2, 11, 26, 81}; // to properly initialize this string?
char output[sizeof(int)*sizeof(numbers)/sizeof(int) + sizeof(numbers)*2]; // int size plus ", " in string
int bytes_used = MyIntToChar(numbers, output);
printf("%*s", bytes_used, output);// this should print "11 26 81" or "11, 26, 81".
}

Making a method template - C++

Below code is used to get a std::string representation from ASCII code.
string Helpers::GetStringFromASCII(const int asciiCode) const
{
return string(1,char(asciiCode));
}
It works well. But in my application, I know the ASCII codes at compile time. So I will be calling it like
string str = GetStringFromASCII(175) // I know 175 at compile time
Question
Is there any way to make the GetStringFromASCII method a template so that the processing happens at compile time and I can avoid calling the function each time at runtime.
Any thoughts?

This kind of template meta programming works well when you're dealing with primitive data types like ints and floats. If you necessarily need a string object, you can't avoid calling the std::string constructor and there's no way that call can happen at compile time. Also, I don't think you can drag the cast to char to compile time either, which, in all, means that templates cannot help you here.

Instead of feeding an int constant to a string conversion function, use a string constant directly:
string str("\xAF"); // 0xAF = 175
By the way, except for heavy performance needs in a loop, trading code readability for some CPU cycles is rarely money effective overall.

Why are you even bothering with a helper function?
string s( 1, char(175) );
That's all you need and it's the quickest you're going to get.

How about something like this:
#include <iostream>
#include <string>
using namespace std;
template <int asciiCode>
inline string const &getStringFromASCII()
{
static string s(1,char(asciiCode));
return s;
}
int main(int, char const**) {
cout << getStringFromASCII<65>() << endl;
}
EDIT: returns a ref now

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

compile-time string hashing - c++

Templates can be instantiated with any external symbol, therefore this should work as expected: external char const* object_method = "object.method"; ... = hash_cstring<object_method>::value; (given the template hash_cstring<> is able to deal with pointer values).

Related

C++ What is wrong about using this approach instead of enums when I want a string representation?

C# String.Format with Parameters standard equivalent in C++?

Using templates for implementing a generic string parser

Reviewing C++: Int to Char

Making a method template - C++

Categories

Resources