So I can't figure out how to do this in C++. I need to do a modulus operation and integer conversion on data that is 96 bits in length.
Example:
struct Hash96bit
{
char x[12];
};
int main()
{
Hash96bit n;
// set n to something
int size = 23;
int result = n % size
}
Edit: I'm trying to have a 96 bit hash because i have 3 floats which when combined create a unique combination. Thought that would be best to use as the hash because you don't really have to process it at all.
Edit: Okay... so at this point I might as well explain the bigger issue. I have a 3D world that I want to subdivide into sectors, that way groups of objects can be placed in sectors that would make frustum culling and physics iterations take less time. So at the begging lets say you are at sector 0,0,0. Sure we store them all in array, cool, but what happens when we get far away from 0,0,0? We don't care about those sectors there anymore. So we use a hashmap since memory isn't an issue and because we will be accessing data with sector values rather than handles. Now a sector is 3 floats, hashing that could easily be done with any number of algorithms. I thought it might be better if I could just say the 3 floats together is the key and go from there, I just needed a way to mod a 96 bit number to fit it in the data segment. Anyway I think i'm just gonna take the bottom bits of each of these floats and use a 64 bit hash unless anyone comes up with something brilliant. Thank you for the advice so far.
UPDATE: Having just read your second edit to the question, I'd recommend you use David's jenkin's approach (which I upvoted a while back)... just point it at the lowest byte in your struct of three floats.
Regarding "Anyway I think i'm just gonna take the bottom bits of each of these floats" - again, the idea with a hash function used by a hash table is not just to map each bit in the input (less till some subset of them) to a bit in the hash output. You could easily end up with a lot of collisions that way, especially if the number of buckets is not a prime number. For example, if you take 21 bits from each float, and the number of buckets happens to be 1024 currently, then after % 1024 only 10 bits from one of the floats will be used with no regard to the values of the other floats... hash(a,b,c) == hash(d,e,c) for all c (it's actually a little worse than that - values like 5.5, 2.75 etc. will only use a couple bits of the mantissa....).
Since you're insisting on this (though it's very likely not what you need, and a misnomer to boot):
struct Hash96bit
{
union {
float f[3];
char x[12];
uint32_t u[3];
};
Hash96bit(float a, float b, float c)
{
f[0] = a;
f[1] = b;
f[2] = c;
}
// the operator will support your "int result = n % size;" usage...
operator uint128_t() const
{
return u[0] * ((uint128_t)1 << 64) + // arbitrary ordering
u[1] + ((uint128_t)1 << 32) +
u[2];
}
};
You can use jenkins hash.
uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
{
uint32_t hash, i;
for(hash = i = 0; i < len; ++i)
{
hash += key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
Related
I have an odd structure with 5 fields of bit length 12 and 4 boolean flags stored in the high bits. This all fits nicely into a 64 bit long, and as such they are stored as a 64 bit word array. What I want to do is search the array and find if any of the 12 bit fields are set to a given value.
I have tried the obvious solution of using bit shifts and masks, however this is a very hot function and needs to be optimized for speed. This led me to the this page containing a way to check for a byte in a word in very few operations. This makes me think it is possible to do something similar with the 12 bit fields, however I am struggling to find what constants I would replace the ones given on that page with.
I'm not very versed in low level languages, but I'm in the mood to fiddle with some bits so I thought I'd give it a try.
POC: JS can't do 64bit longs, but we can check if we can adapt the algorithm to deal with 2x12bit fields + 8boolean flags (noise) in an 32bit (u)int.
The noise because the original algorithm. Dealt with exactly 4 bytes and no further bits, but neither 32 nor 64 can be divided by 12 so we need to ensure that these additional bits don't interfere. Or worse, get matched.
function hasValue(x, n) { return hasZero(x ^ (0x001001 * n)); }
function hasZero(v) { return ((v - 0x001001) & ~(v) & 0x800800); }
function hex(v) { return "0x" + v.toString(16) }
// create a random value, 2x12bit fields plus 8 random flags.
var v = Math.floor(Math.random() * 0x100000000);
console.log("value", hex(v));
// get the two fields
var a = v & 0xFFF;
console.log("check", hex(a), !!hasValue(v, a));
var b = (v >> 12) & 0xFFF;
console.log("check", hex(b), !!hasValue(v, b));
// brute force.
// check if any other value is matched.
// these should only return the 2 values from above.
for (var i = 0; i < 0x1000; ++i) {
if (hasValue(v, i)) {
console.log("matched", hex(i));
}
}
extrapolating from this, your solution should be
#define hasValue(x,n) hasZero(x ^ (0x001001001001001 * n))
#define hasZero(v) ((v - 0x001001001001001) & ~(v) & 0x800800800800800)
where all values are unsigned longs. (sorry don't know if you somehow have to annotate any of these numbers)
In a question about a very simple hashing algorithm called djb2, the author wants to know why the number 33 is chosen in the algorithm (see below code in C).
unsigned long;
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++) //just the character
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
In the top answer, point 2 talks about the hashing accumulator and how it makes two copies of itself, and then it says something about the spreading.
Can someone explain what is meant by "copying itself" and the "spread" of answer 2?
The step 2 being references is this:
As you can see from the shift and add implementation, using 33 makes two copies of most of the input bits in the hash accumulator, and then spreads those bits relatively far apart. This helps produce good avalanching. Using a larger shift would duplicate fewer bits, using a smaller shift would keep bit interactions more local and make it take longer for the interactions to spread.
33 is 32+1. That means, thanks to multiplication being distributive, that hash * 33 = (hash * 32) + (hash * 1) - or in other words, make two copies of hash, shift one of them left by 5 bits, then add them together, which is what (hash << 5) + hash expresses in a more direct way.
I'm working on generating different types of Gradient Noise. One of the things that this noise requires is the generation of random vectors given a position vector.
This position vector could be anything from a single int, or a 2D position, 3D position, 4D position etc.
On top of this, an additional "seed" value is needed.
What's required is a hash of these n+1 integers into a unique integer with which I can seed a PRNG. It's important that it's these values as I need to be able to retrieve the original seed every time the same values are used.
So far I've tried an implementation of Fowler–Noll–Vo; but it was way too slow for my purposes.
I've also tried using successive calls to a pairing function:
int pairing_function(int x, int y)
{
return(0.5*(x+y)*(x+y+1) + x);
}
I.e.:
int hash = pairing_function(pairing_function(x,y),seed);
But what seems to happen is that with a large enough seed, the values overflow the size of an int (or even larger types).
What's a good method to achieve what I'm trying to do here? What's important is speed over any cryptographic concerns as well as not returning numbers larger than my original data types.
I'm using C++ but so long as any code is readable I can nut it out.
It is strange that FNV be way too slow because it is just 1 xor and 1 integer product per byte of data. From Wikipedia [it is ] designed to be fast to compute.
If you want something really quick, you can try these implementations, where the multiplication is coded as shifts and additions :
dan bernstein implementation :
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
sdbm implementation (hash(i) = hash(i - 1) * 65599 + str[i]) :
static unsigned long
sdbm(str)
unsigned char *str;
{
unsigned long hash = 0;
int c;
while (c = *str++)
hash = c + (hash << 6) + (hash << 16) - hash;
return hash;
}
References "Hash Functions" from cse.yorku.ca
It sounds like FNV you used might have been inefficient because of the way it was used. Here's (I think, I haven't tested it) the same thing in a way that could be trivially inlined.
inline uint32_t hash(uint32_t h, uint32_t x) {
for (int i = 0; i < 4; i++) {
h ^= x & 255;
x >>= 8;
h = (h << 24) + h * 0x193;
}
return h;
}
I think calling hash(hash(2166136261, seed), x) or hash(hash(hash(2166136261, seed), x), y) should give you the same result (assuming little-endian) as a library function.
However, to speed that up at the cost of hash quality, you can might try a change like this:
inline uint32_t hash(uint32_t h, uint32_t x) {
for (int i = 0; i < 2; i++) {
h ^= x & 65535;
x >>= 16;
h = (h << 24) + h * 0x193;
}
return h;
}
or even:
inline uint32_t hash(uint32_t h, uint32_t x) {
h ^= x;
h = (h << 24) + h * 0x193;
return h;
}
These changes weaken the low-order bits somewhat, so you'll want to follow standard practice in using the high-order bits preferentially. For example, if your need only 16 bits, then shift the final result right by 16 rather than masking it with 0xffff;
The h = ... line will regularly overflow an int, though, and it relies on the standard mod-2**32 behaviour. If that's a problem then you'll want to replace that line with something different and perhaps accept fewer useful bits in your hash. Maybe h = (h >> 4) + (h & 0x7fffff) * 0x193; but that's just a random tweak and I haven't checked it for hash quality.
I will challenge you on
So far I've tried an implementation of Fowler–Noll–Vo; but it was way too slow for my purposes.
as in some simple benchmarks I've done the FNV hash is the fastest. I assume you have benchmarks for all hashes you've tried?
For the benchmark I just simply measured the time taken for 1 billion hashes of various algorithms in MVSC++ 2013 using two 32-bit unsigned int for input:
FNV (32-bit) = 222M hashes/sec
Your pairing_function() = 175M hashes/sec
Simple Hash x + (y << 10) = 170M hashes/sec
Your hash() function using pairing_function() = 167M hashes/sec
Dan Bernstein = 101M hashes/sec
Obviously these are very basic benchmark results and I wouldn't necessarily trust them all that much. I wouldn't be surprised to see some algorithms run faster/slower on different platforms and compilers.
Overall though, while FNV is the fastest in this case there is only a factor of two difference between the fastest and slowest. If this really makes a difference in your case I would suggest taking another look at your problem to see if it can be redesigned to not need the hash or at least reduce the dependence on the hash speed.
Note: I changed your pairing function to:
int pairing_function(int x, int y)
{
return((x+y)*(x+y+1)/2 + x);
}
for the above benchmarks. Using your version results in a conversion to/from double which makes it x5 slower and your hash() function x8 slower.
Update
For the FNV hash I found a source online and modified it from there to work directly on 2 integers (assumes a 32-bit integer):
#define FNV_32_PRIME 16777619u
unsigned int FNVHash32(const int input1, const int input2)
{
unsigned int hash = 2166136261u;
const unsigned char* pBuf = (unsigned char *) &input1;
for (int i = 0; i < 4; ++i)
{
hash *= FNV_32_PRIME;
hash ^= *pBuf++;
}
pBuf = (unsigned char *) &input2;
for (int i = 0; i < 4; ++i)
{
hash *= FNV_32_PRIME;
hash ^= *pBuf++;
}
return hash;
}
Since FNV just works on bytes you can extend this to work with any number of integers or other data.
I am trying to convert a binary array to decimal in following way:
uint8_t array[8] = {1,1,1,1,0,1,1,1} ;
int decimal = 0 ;
for(int i = 0 ; i < 8 ; i++)
decimal = (decimal << 1) + array[i] ;
Actually I have to convert 64 bit binary array to decimal and I have to do it for million times.
Can anybody help me, is there any faster way to do the above ? Or is the above one is nice ?
Your method is adequate, to call it nice I would just not mix bitwise operations and "mathematical" way of converting to decimal, i.e. use either
decimal = decimal << 1 | array[i];
or
decimal = decimal * 2 + array[i];
It is important, before attempting any optimisation, to profile the code. Time it, look at the code being generated, and optimise only when you understand what is going on.
And as already pointed out, the best optimisation is to not do something, but to make a higher level change that removes the need.
However...
Most changes you might want to trivially make here, are likely to be things the compiler has already done (a shift is the same as a multiply to the compiler). Some may actually prevent the compiler from making an optimisation (changing an add to an or will restrict the compiler - there are more ways to add numbers, and only you know that in this case the result will be the same).
Pointer arithmetic may be better, but the compiler is not stupid - it ought to already be producing decent code for dereferencing the array, so you need to check that you have not in fact made matters worse by introducing an additional variable.
In this case the loop count is well defined and limited, so unrolling probably makes sense.
Further more it depends on how dependent you want the result to be on your target architecture. If you want portability, it is hard(er) to optimise.
For example, the following produces better code here:
unsigned int x0 = *(unsigned int *)array;
unsigned int x1 = *(unsigned int *)(array+4);
int decimal = ((x0 * 0x8040201) >> 20) + ((x1 * 0x8040201) >> 24);
I could probably also roll a 64-bit version that did 8 bits at a time instead of 4.
But it is very definitely not portable code. I might use that locally if I knew what I was running on and I just wanted to crunch numbers quickly. But I probably wouldn't put it in production code. Certainly not without documenting what it did, and without the accompanying unit test that checks that it actually works.
The binary 'compression' can be generalized as a problem of weighted sum -- and for that there are some interesting techniques.
X mod (255) means essentially summing of all independent 8-bit numbers.
X mod 254 means summing each digit with a doubling weight, since 1 mod 254 = 1, 256 mod 254 = 2, 256*256 mod 254 = 2*2 = 4, etc.
If the encoding was big endian, then *(unsigned long long)array % 254 would produce a weighted sum (with truncated range of 0..253). Then removing the value with weight 2 and adding it manually would produce the correct result:
uint64_t a = *(uint64_t *)array;
return (a & ~256) % 254 + ((a>>9) & 2);
Other mechanism to get the weight is to premultiply each binary digit by 255 and masking the correct bit:
uint64_t a = (*(uint64_t *)array * 255) & 0x0102040810204080ULL; // little endian
uint64_t a = (*(uint64_t *)array * 255) & 0x8040201008040201ULL; // big endian
In both cases one can then take the remainder of 255 (and correct now with weight 1):
return (a & 0x00ffffffffffffff) % 255 + (a>>56); // little endian, or
return (a & ~1) % 255 + (a&1);
For the sceptical mind: I actually did profile the modulus version to be (slightly) faster than iteration on x64.
To continue from the answer of JasonD, parallel bit selection can be iteratively utilized.
But first expressing the equation in full form would help the compiler to remove the artificial dependency created by the iterative approach using accumulation:
ret = ((a[0]<<7) | (a[1]<<6) | (a[2]<<5) | (a[3]<<4) |
(a[4]<<3) | (a[5]<<2) | (a[6]<<1) | (a[7]<<0));
vs.
HI=*(uint32_t)array, LO=*(uint32_t)&array[4];
LO |= (HI<<4); // The HI dword has a weight 16 relative to Lo bytes
LO |= (LO>>14); // High word has 4x weight compared to low word
LO |= (LO>>9); // high byte has 2x weight compared to lower byte
return LO & 255;
One more interesting technique would be to utilize crc32 as a compression function; then it just happens that the result would be LookUpTable[crc32(array) & 255]; as there is no collision with this given small subset of 256 distinct arrays. However to apply that, one has already chosen the road of even less portability and could as well end up using SSE intrinsics.
You could use accumulate, with a doubling and adding binary operation:
int doubleSumAndAdd(const int& sum, const int& next) {
return (sum * 2) + next;
}
int decimal = accumulate(array, array+ARRAY_SIZE,
doubleSumAndAdd);
This produces big-endian integers, whereas OP code produces little-endian.
Try this, I converted a binary digit of up to 1020 bits
#include <sstream>
#include <string>
#include <math.h>
#include <iostream>
using namespace std;
long binary_decimal(string num) /* Function to convert binary to dec */
{
long dec = 0, n = 1, exp = 0;
string bin = num;
if(bin.length() > 1020){
cout << "Binary Digit too large" << endl;
}
else {
for(int i = bin.length() - 1; i > -1; i--)
{
n = pow(2,exp++);
if(bin.at(i) == '1')
dec += n;
}
}
return dec;
}
Theoretically this method will work for a binary digit of infinate length
Given 3 different bytes such as say x = 64, y = 90, z = 240 I am looking to concatenate them into say a string like 6490240. It would be lovely if this worked but it doesn't:
string xx = (string)x + (string)y + (string)z;
I am working in C++, and would settle for a concatenation of the bytes as a 24 bit string using their 8-bit representations.
It needs to be ultra fast because I am using this method on a lot of data, and it seems frustratingly like their isn't a way to just say treat this byte as if it were a string.
Many thanks for your help
To clarify, the reason why I'm particular about using 3 bytes is because the original data pertains to RGB values which are read via pointers and are stored of course as bytes in memory.
I want a way really to treat each color independently so you can think of this as a hashing function if you like. So any fast representation that does it without collisions is desired. This is the only way I can think of to avoid any collisions at all.
Did you consider instead just packing the color elements into three bytes of an integer?
uint32_t full_color = (x << 16) | (y << 8) | z;
Easiest way to turn numbers into a string is to use ostringstream
#include <sstream>
#include <string>
std::ostringstream os;
os << x << y << z;
std::string str = os.str(); // 6490240
You can even make use of manipulators to do this in hex or octal:
os << std::hex << x << y << z;
Update
Since you've clarified what you really want to do, I've updated my answer. You're looking to take RGB values as three bytes, and use them as a key somehow. This would be best done with a long int, not as a string. You can still stringify the int quite easily, for printing to the screen.
unsigned long rgb = 0;
byte* b = reinterpret_cast<byte*>(&rgb);
b[0] = x;
b[1] = y;
b[2] = z;
// rgb is now the bytes { 0, x, y, z }
Then you can use the long int rgb as your key, very efficiently. Whenever you want to print it out, you can still do that:
std::cout << std::hex << rgb;
Depending on the endian-ness of your system, you may need to play around with which bytes of the long int you set. My example overwrites bytes 0-2, but you might want to write bytes 1-3. And you might want to write the order as z, y, x instead of x, y, z. That kind of detail is platform dependent. Although if you never want to print the RGB value, but simply want to consider it as a hash, then you don't need to worry about which bytes you write or in what order.
try sprintf(xx,"%d%d%d",x,y,z);
Use a 3 character character array as your 24 bit representation, and assign each char the value of one of your input values.
Converting 3 bytes to bits and storing the result in an array can be done easily as below:
void bytes2bits(unsigned char x, unsigned char y, unsigned char z, char * res)
{
res += 24; *res-- = 0;
unsigned xyz = (x<<16)+(y<<8)+z;
for (size_t l = 0 ; l < 24 ; l++){
*res-- = '0'+(xyz & 1); xyz >>= 1;
}
}
However, if you are looking for a way to store three bytes values in a non ambiguous and compact way, you should probably settle for hexadecimal. (each group of four bits of the binary representation match a digit between 0 to 9 or a letter between A to F). It's ultra simple and ultra simple to encode and decode and also fit a human readable output.
If you never need to printout the result, just combining the values as a single integer and use it as a key as proposed Mark is certainly the fastest and the simplest solution. Assuming your native integer is 32 bits or more on the target system, just do:
unsigned int key = (x<< 16)|(y<<8)|z;
You can as easily get back the initial values from key if needed:
unsigned char x = (key >> 16) & 0xFF;
unsigned char y = (key >> 8) & 0xFF;
unsigned char z = key & 0xFF;