For my implementation of the minhashing algorithm I need to make many random permutations of integers, which will be simulated by using random hash functions (as many as possible). Currently I use hash functions of the form:
h(x) = (a*x + b) % c
where a and b are randomly generated numbers, and c is a prime number bigger than the highest value of b. Anyways, the code runs way too slow and it is impossible to use more than 15 of such hash functions in reasonable running time. Can anyone recommend other ways of using random hash functions for integers in Python? In other posts I came across suggestions for using bitwise shuffling and an XOR operation, but I didn't fully understand how one should implement something like this (I'm relatively new to Python).
Borrowing from my answer to a similar question, and having a quick look at Python documentation to try to guess valid syntax...
The code you posted is OK but it's probably subject to being computed in longer precision than is optimal, and it involves a division which also makes things slow.
To make it faster, you can fix c at a power of two, and you can use binary & (and) instead of modulo, which gives you this:
h(x) = (a * x + b) & ((1 << 32) - 1)
which is the same as:
h(x) = (a * x + b) & (4294967296 - 1)
which is the same as:
h(x) = (a * x + b) % 4294967296
and you must ensure that a is an odd number (this is all that's needed to make it co-prime with c when c is a power of two). This example limits the output range to a 32-bit integer. You can change that as you see fit. I don't know what Python's limits are.
If you want more parameterisation than that, or you discover that the results aren't "random" enough (it would fail statistical tests very quickly, but that usually doesn't matter), then you can add more operations; but you can't add more of those operations because a chain of adds and multiplies will always simplify to just one pair of add and multiply, so the extra operations wouldn't fix anything.
What you can do instead is to use bit shifts and exclusive-or to break up the linearity; like so:
def h(x):
x = x ^ (x >> 16)
x = (a * x + b) & ((1 << 32) - 1)
x = x ^ (x >> 16)
x = (c * x + d) & ((1 << 32) - 1)
x = x ^ (x >> 16)
return x
You can experiment with variations on that if you want. If you set b and d to zero and change the middle 16 to 13 then you get the MurmurHash3 finaliser construction, which is near enough to ideal for most purposes provided you pick good a and c (sadly they can't just be random).
Related
I'm using the following code to map two signed 16-bit integers to the upper and lower 16 bits of an unsigned 32 bit integer.
inline uint32_t to_score(int16_t mg, int16_t eg) {
return ((1u * mg) << 16 | (eg & 0xFFFF));
}
inline int16_t extract_mg(uint32_t score) {
return int16_t(score >> 16);
}
inline int16_t extract_eg(uint32_t score) {
return int16_t(score & 0xFFFF);
}
I need to perform various calculations on both the mg and eg parts simultaneously, before interpolating the two parts at the end of a function.
As I understand it, as long as there is no overflow, it should be safe to add two uint32_ts created to_score, and then extract the int16_ts to find the results of the individual calculations: i.e. the results if I added the the values for mg and eg separately.
I'm not sure whether this assumption holds if either mg or eg are negative, or whether this method can be used for subtraction, multiplication and/or division.
Which operations can I expect to function correctly? Are there alternative ways of representing two integers which can be added/subtracted/multiplied quickly?
There will be a problem with a carry going from the low half into the high half, but it can be avoided with extra operations, as detailed on for example chessprogramming.org/SIMD_and_SWAR_Techniques
z = ((x &~H) + (y &~H)) ^ ((x ^ y) & H)
Where in this case H = 0x80008000.
As an other alternative, it could be done with two additions, but with optimized extraction/recombination:
// low half addition, leaving upper half corrupted but it will be ignored
l = x + y
// high half addition, adding 0 to the bottom so no carry
h = x + (y & 0xFFFF0000)
// recombine
z = (l & 0xFFFF) | (h & 0xFFFF0000)
Subtraction is a minor variation on addition.
Multiplication unfortunately cares about absolute bit-positions, so values have to be moved (shifted) to their notional position for it to work. Actual SIMD can still be used though, such as _mm_mullo_epi16 with SSE2.
C++ signed integers are two's complement, it is on the way to be standardized in C++20, in practice you may already assume that.
Some cases of addition and subtraction would work, those cases that don't cause either of following: eg to overflow, mg to overflow, mg to change sign.
The optimization does not make much sense.
If there's larger array, you can try to get your operations vectorized with proper SIMD instruction, if they are available for your platform by enabling compiler optimization or by using intrinsics ( _mm_adds_pi16 might be the one you need ).
If you have just two integers, just compute them one by one.
I have a liking to finding shortest methods for coding. I have found a need for a method for calculating the sum of the digits(or the number of 1s in a number) of a number represented in binary. I have used bit operators and found this:
r=1;while(a&=a-1)r++;
where a is the number, and r is the count. a is a given integer. Is there any way to shorten this/improve the algorithm?
Shortest as in shortest length of source code.
Your solution assumes a to have an unsigned type.
Yet the code does not work for a = 0. You can fix it this way:
r=!!a;while(a&=a-1)r++;
You can shave one character off this way:
for(r=!!a;a&=a-1;r++);
But here is an alternative solution with the same source length:
for(r=0;a;a/=2)r+=a&1;
As Lundin mentioned, code golfing is off topic on Stack Overflow. It is a fun game, and one can definitely hone his C skills at trying to make the smallest code that is still fully defined for a given problem, but the resulting code is of poor value to casual readers trying to program at a more basic level.
Regarding the on topic part of your question, The quickest method to compute the number of bits in an integer: this problem has been studied intensively and several methods are available. Which one is fastest depends on many factors:
how portable the code need to be. Some processors have built-in instructions for this and the compiler may provide a way to generate them via intrinsics or inline assembly.
the expected range of values for the argument. If the range is small, a simple lookup table may yield the best performance.
the distribution of values of the argument: if a specific value is almost always given, just testing for it might be the fastest solution.
the cpu specific performance: different algorithms use different instructions, the relative performance of different cpus may differ.
Only careful benchmarking will tell you if a given approach is preferable to another, or if you are trying to optimise code whose performance is irrelevant. Provable correctness is much more important than micro-optimisation. Many experts consider optimisation to always be premature.
An interesting solution for 32-bit integers is this:
uint32_t bitcount_parallel(uint32_t v) {
uint32_t c = v - ((v >> 1) & 0x55555555);
c = ((c >> 2) & 0x33333333) + (c & 0x33333333);
c = ((c >> 4) + c) & 0x0F0F0F0F;
c = ((c >> 8) + c) & 0x00FF00FF;
return ((c >> 16) + c) & 0x0000FFFF;
}
If multiplication is fast, here is a potentially faster solution:
uint32_t bitcount_hybrid(uint32_t v) {
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
return ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
Different solutions are detailed here: https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
The fastest possible code is to generate a look-up table, with the value of the variable as index. Example for uint8_t:
const uint8_t NUMBER_OF_ONES [256] =
{
0, // 0
1, // 1
1, // 2
2, // 3
1, // 4
2, // 5
...
8, // 255
};
You would use it as n = NUMBER_OF_ONES[a];.
The second fastest is to generate smaller look-up tables, to save ROM. For example a nibble-wise look-up for numbers 0 to 15, which you would then call for every nibble in the data type.
Note that the requirement "Shortest as in shortest length of source code." is nonsense, that's not a metric used by professionals. If that's truly what you are after, for the sake of fun or obfuscation, then the question is off-topic on SO and should be asked at https://codegolf.stackexchange.com instead.
I've been reading about the XorShift PRNG especially the paper here
A guy here states that
The number lies in the range [1, 2**64). Note that it will NEVER be 0.
Looking at the code that makes sense:
uint64_t x;
uint64_t next(void) {
x ^= x >> 12; // a
x ^= x << 25; // b
x ^= x >> 27; // c
return x * UINT64_C(2685821657736338717);
}
If x would be zero than every next number would be zero too. But wouldn't that make it less useful? The usual use-pattern would be something like min + rand() % (max - min) or converting the 64 bits to 32 bits if you only need an int. But if 0 is never returned than that might be a serious problem. Also the bits are not 0 or 1 with the same probability as obviously 0 is missing so zeroes or slightly less likely. I even can't find any mention of that on Wikipedia so am I missing something?
So what is a good/appropriate way to generate random, equally distributed numbers from XorShift64* in a given range?
Short answer: No it cannot return zero.
According the Numeric Recipes "it produces a full period of 2^64-1 [...] the missing value is zero".
The essence is that those shift values have been chosen carefully to make very long sequences (full possible one w/o zero) and hence one can be sure that every number is produced. Zero is indeed the fixpoint of this generator, hence it produces 2 sequences: Zero and the other containing every other number.
So IMO for a sufficiently small range max-min it is enough to make a function (next() - 1) % (max - min) + min or even omitting the subtraction altogether as zero will be returned by the modulo.
If one wants better quality equal distribution one should use the 'usual' method by using next() as a base generator with a range of [1, 2^64)
I am nearly sure that there is an x, for which the xorshift operation returns 0.
Proof:
First, we have these equations:
a = x ^ (x >> 12);
b = a ^ (a << 25);
c = b ^ (b >> 27);
Substituting them:
b = (x ^ x >> 12) ^ ((x ^ x >> 12) << 25);
c = b ^ (b >> 27) = ((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) ^ (((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) >> 27);
As you can see, although c is a complex equation, it is perfectly abelian.
It means, you can express the bits of c as fully boolean expressions of the bits of x.
Thus, you can simply construct an equation system for the bits b0, b1, b2, ... so:
(Note: the coefficients are only examples, I didn't calculate them, but so would it look):
c0 = x1 ^ !x32 ^ x47 ...
c1 = x23 ^ x45 ^ !x61 ...
...
c63 = !x13 ^ ...
From that point, you have 64 equations and 64 unknowns. You can simply solve it with Gauss-elimination, you will always have a single unique solution.
Except some rare cases, i.e. if the determinant of the coefficients of the equation system is zero, but it is very unlikely in the size of such a big matrix.
Even if it happens, it would mean that you have an information loss in every iteration, i.e. you can't get all of the 2^64 possible values of x, only some of them.
Now consider the much more probable possibility, that the coefficient matrix is non-zero. In this case, for all the possible 2^64 values of x, you have all possible 2^64 values of c, and these are all different.
Thus, you can get zero.
Extension: actually you get zero for zero... sorry, the proof is more useful to show that it is not so simple as it seems for the first spot. The important part is that you can express the bits of c as a boolean function of the bits of x.
There is another problem with this random number generator. And this is that even if you somehow modify the function to not have such problem (for example, by adding 1 in every iteration):
You still can't guarantee that it won't get into a short loop *for any possible values of x. What if there is a 5 length loop for value 345234523452345? Can you prove for all possible initial values? I can't.
Actually, having a really pseudorandom iteration function, your system will likely loop after 2^32 iterations. It has a nearly trivial combinatoric reason, but "unfortunately this margin is small to contain it" ;-)
So:
If a 2^32 loop length is for your PRNG okay, then use a proven iteration function collected from somewhere on the net.
If it isn't, upgrade the bit length to at least 2^128. It will result a roughly 2^64 loop length which is not so bad.
If you still want a 64-bit output, then use 128-bit numeric internally, but return (x>>64) ^ (x&(2^64-1)) (i.e. xor-ing the upper and lower half of the internal state x).
Let's say that A and B are my lower 32bit integers, and T is a 32bit integer that I want to represent the carry from adding A and B. I threw together some quick logic to get T:
T = (A >> 1) + (B >> 1);
T += A & B & 1;
T >>= 31;
It works, but I'm not a huge fan of the number of operations required to do this (3 right shifts, 2 adds, 2 ands). I'd really appreciate input from some bit-twiddling experts on how to make this cleaner/more efficient. Thanks!
For my current problem, I'm limited to what I can do in HLSL. I wouldn't mind SM4 and SM5 specific solutions, but I'm also open to something that would work in general.
Would the technique here work: Efficient 128-bit addition using carry flag?
S = A + B;
T = (S < A);
I am assuming a comparison results in 1 or 0 like in C. If not, add ?1:0 to the end of the statement.
So I've been doing some work recently with the modpow function. One of the forms I needed was Modular Exponentiation when the Modulus is a Power of 2. So I got the code up and running. Great, no problems. Then I read that one trick you can make to get it faster is, instead of using the regular exponent, takes it's modulus over the totient of the modulus.
Now when the modulus is a power of two, the answer is simply the power of 2 less than the current one. Well, that's simple enough. So I coded it, and it worked..... sometimes.
For some reason there are some values that aren't working, and I just can't figure out what it is.
uint32 modpow2x(uint32 B, uint32 X, uint32 M)
{
uint32 D;
M--;
B &= M;
X &= (M >> 1);
D = 1;
if ((X & 1) == 1)
{
D = B;
}
while ((X >>= 1) != 0)
{
B = (B * B) & M;
if ((X & 1) == 1)
{
D = (D * B) & M;
}
}
return D;
}
And this is one set of numbers that it doesn't work for.
Base = 593803430
Exponent = 3448538912
Modulus = 8
And no, there is no check in this function to determine if the Modulus is a power of 2. The reason is that this is an internal function and I already know that only Powers of 2 will be passed to it. However, I have already double checked to make sure that no non-powers of 2 are getting though.
Thanks for any help you guys can give!
It's true that if x is relatively prime to n (x and n have no common factors), then x^a = x^(phi(a)) (mod n), where phi is Euler's totient function. That's because then x belongs to the multiplicative group of (Z/nZ), which has order phi(a).
But, for x not relatively prime to n, this is no longer true. In your example, the base does have a common factor with your modulus, namely 2. So the trick will not work here. If you wanted to, though, you could write some extra code to deal with this case -- maybe find the largest power of 2 that x is divisible by, say 2^k. Then divide x by 2^k, run your original code, shift its output left by k*e, where e is your exponent, and reduce modulo M. Of course, if k isn't zero, this would usually result in an answer of zero.