I'm looking for a fast hash function, for use in a hash table look-up. The input consists of expressions of the recursive form f(x, y), where x and y can either be functions with two arguments, or variables. A few examples:
b(b(b(b,b),b),b)
foo(bar,bar)
a(a(a,a),a(a(a,a),a))
These expressions can be up to 200.000 characters long however, and I need to hash thousands of expressions into the same table. I found something that works somewhat:
int hash(string s, int n) {
unsigned int v = 37;
for(string::iterator it = s.begin(); it != s.end(); it++)
v = (v * A) ^ (*it * B);
return (v * n) % C;
}
Where the input consists of only the first 10 characters of the expression AND the length of the whole expression. A, B and C are 541, 733 and 941 respectively. This algorithm runs in under 100ms for several worst cases (long, repetitive, nested loops like the first example), but I get a lot of collisions and I would like to know if I can get closer to an O(1) look-up even in these cases.
Try this:
uint32_t hash(const string &s, uint32_t n) {
uint32_t step = 1 | (s.size() >> 4); // ~16 iters
uint32_t h = 0x1F351F35; // Barker code - 2
for(uint32_t i = 0; i < s.size(); i += step + (h & step))
h = ((h << 5) | (h >> (32 - 5))) + (s[i] ^ n ^ i);
return h % C;
}
Related
While taking input output in C++ I have only used scanf/printf and cin/cout. Now I recently came across this code taking I/O in a strange fashion.
Also note that this I/O method is causing the code to run extremely fast, as this code uses almost the same algorithm as most of the other codes but it executes in a much smaller time. Why is this I/O so fast and how does this work in general?
edit: code
#include <bits/stdtr1c++.h>
#define MAXN 200010
#define MAXQ 200010
#define MAXV 1000010
#define clr(ar) memset(ar, 0, sizeof(ar))
#define read() freopen("lol.txt", "r", stdin)
using namespace std;
const int block_size = 633;
long long res, out[MAXQ]; int n, q, ar[MAXN], val[MAXN], freq[MAXV];
namespace fastio{
int ptr, ye;
char temp[25], str[8333667], out[8333669];
void init(){
ptr = 0, ye = 0;
fread(str, 1, 8333667, stdin);
}
inline int number(){
int i, j, val = 0;
while (str[ptr] < 45 || str[ptr] > 57) ptr++;
while (str[ptr] > 47 && str[ptr] < 58) val = (val * 10) + (str[ptr++] - 48);
return val;
}
inline void convert(long long x){
int i, d = 0;
for (; ;){
temp[++d] = (x % 10) + 48;
x /= 10;
if (!x) break;
}
for (i = d; i; i--) out[ye++] = temp[i];
out[ye++] = 10;
}
inline void print(){
fwrite(out, 1, ye, stdout);
} }
struct query{
int l, r, d, i;
inline query() {}
inline query(int a, int b, int c){
i = c;
l = a, r = b, d = l / block_size;
}
inline bool operator < (const query& other) const{
if (d != other.d) return (d < other.d);
return ((d & 1) ? (r < other.r) : (r > other.r));
} } Q[MAXQ];
void compress(int n, int* in, int* out){
unordered_map <int, int> mp;
for (int i = 0; i < n; i++) out[i] = mp.emplace(in[i], mp.size()).first->second; }
inline void insert(int i){
res += (long long)val[i] * (1 + 2 * freq[ar[i]]++); }
inline void erase(int i){
res -= (long long)val[i] * (1 + 2 * --freq[ar[i]]); }
inline void run(){
sort(Q, Q + q);
int i, l, r, a = 0, b = 0;
for (res = 0, i = 0; i < q; i++){
l = Q[i].l, r = Q[i].r;
while (a > l) insert(--a);
while (b <= r) insert(b++);
while (a < l) erase(a++);
while (b > (r + 1)) erase(--b);
out[Q[i].i] = res;
}
for (i = 0; i < q; i++) fastio::convert(out[i]); }
int main(){
fastio::init();
int n, i, j, k, a, b;
n = fastio::number();
q = fastio::number();
for (i = 0; i < n; i++) val[i] = fastio::number();
compress(n, val, ar);
for (i = 0; i < q; i++){
a = fastio::number();
b = fastio::number();
Q[i] = query(a - 1, b - 1, i);
}
run();
fastio::print();
return 0; }
This solution, http://codeforces.com/contest/86/submission/22526466 (624 ms, 32 MB RAM uses) uses single fread and manual parsing of numbers from memory (so it uses more memory); many other solutions are slower and uses scanf (http://codeforces.com/contest/86/submission/27561563 1620 ms 9MB) or C++ iostream cin (http://codeforces.com/contest/86/submission/27558562 3118 ms, 15 MB). Not all difference of solutions comes from input-output and parsing (solutions methods have differences too), but some is.
fread(str, 1, 8333667, stdin);
This code uses single fread libcall to read up to 8MB, which is full file. The file may have up to 2 (n,t) + 200000 (a_i) + 2*200000 (l,r) 6/7-digit numbers with or without line breaks or separated by one (?) space, so around 8 chars max for number (6 or 7 for number, as 1000000 is allowed too, and 1 space or \n); max input file size is like 0.6 M * 8 bytes =~ 5 MB.
inline int number(){
int i, j, val = 0;
while (str[ptr] < 45 || str[ptr] > 57) ptr++;
while (str[ptr] > 47 && str[ptr] < 58) val = (val * 10) + (str[ptr++] - 48);
return val;
}
Then code uses manual code of parsing decimal int numbers. According to ascii table, http://www.asciitable.com/ decimal codes of 48...57 are decimal digits (second while loop): '0'...'9', and we can just subtract 48 from the letter code to get the digit; multiply partially read val by 10 and add current digit. And chr<45 || chr > 57 in the first while loops sound like skipping non-digits from input. This is incorrect, as this code will not parse codes 45, 46, 47 = '-', '.', '/', and no any number after these chars will be read.
n = fastio::number();
q = fastio::number();
for (i = 0; i < n; i++) val[i] = fastio::number();
for (i = 0; i < q; i++){
a = fastio::number();
b = fastio::number();
Actual reading uses this fastio::number() method; and other solutions uses calling of scanf or iostream operator << in loop:
for (int i = 0; i < N; i++) {
scanf("%d", &(arr[i]));
add(arr[i]);
}
or
for (int i = 1; i <= n; ++i)
cin >> a[i];
Both methods are more universal, but they do library call, which will read some chars from internal buffer (like 4KB) or call OS syscall for buffer refill, and every function does many checks and has error reporting: For every number of input scanf will reparse the same format string of first argument, and will do all the logic described in POSIX http://pubs.opengroup.org/onlinepubs/7908799/xsh/fscanf.html and all error-checking. C++ iostream has no format string, but it is still more universal: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/istream.tcc#L156 'operator>>(int& __n)'.
So, standard library functions have more logic inside, more calls, more branching; and they are more universal and much safer, and should be used in real-world programming. And this "sport programming" contest allow users to solve the task with standard library functions which are fast enough, if you can imagine the algorithm. Authors or task are required to write several solutions with standard i/o functions to check that timelimit of the task is correct and task can be solved. (The TopCoder system is better with i/o, you will not implement i/o, the data is already passed into your function in some language structs/collections).
Sometimes tasks in sport programming have tight limits on memory: input files several times bigger than allowed memory usage, and programmer can't read whole file into memory. For example: get 20 mln of digits of single verylong number from input file and add 1 to it, with memory limit of 2 MB; you can't read full input number from file in forward direction; it is very hard to do correct reading in chunks in backward direction; and you just need to forget standard method of addition (Columnar addition) and build FSM (Finite-state machine) with state, counting sequences of 9s.
I am trying to find/create a bit twiddling algorithm that generates all K-bit-count permutations of the 1s in an N-bit-count bitmask where K < N. The number of permutations is (N choose K) = N!/(K!(N-K)!).
These two algorithms, from Bit Twiddling Hacks, are close.
unsigned int v; // current permutation of bits where bitCount(v) == K
unsigned int w; // next permutation where bitCount(w) == bitCount(v)
unsigned int t = v | (v - 1);
w = (t + 1) | (((~t & -~t) - 1) >> (trailingZeroCount(v) + 1));
Similarly.
unsigned int v; // current permutation of bits where bitCount(v) == K
unsigned int w; // next permutation where bitCount(w) == bitCount(v)
unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
These algorithms generate permutations in lexicographic order, which I don't necessarily need. However, I do need an algorithm that incorporates a bitmask m.
unsigned int m; // bitmask from which next permutation is chosen
// where bitCount(m) == N
unsigned int v; // current permutation of bits where (v & m) == v
// and bitCount(v) == K
unsigned int w; // next permutation of bits where (w & m) == w
// and bitCount(w) == bitCount(v)
...
One option is to use a CPU instruction such as PEXT in Intel Haswell and newer to expand the result from the bit twiddling solution you mention to the mask. If you don't have such an instruction available, you'll probably need a loop. I give two possibilities in this answer.
I have a bit-mask of N chars in size, which is statically known (i.e. can be calculated at compile time, but it's not a single constant, so I can't just write it down), with bits set to 1 denoting the "wanted" bits. And I have a value of the same size, which is only known at runtime. I want to collect the "wanted" bits from that value, in order, into the beginning of a new value. For simplicity's sake let's assume the number of wanted bits is <= 32.
Completely unoptimized reference code which hopefully has the correct behaviour:
template<int N, const char mask[N]>
unsigned gather_bits(const char* val)
{
unsigned result = 0;
char* result_p = (char*)&result;
int pos = 0;
for (int i = 0; i < N * CHAR_BIT; i++)
{
if (mask[i/CHAR_BIT] & (1 << (i % CHAR_BIT)))
{
if (val[i/CHAR_BIT] & (1 << (i % CHAR_BIT)))
{
if (pos < sizeof(unsigned) * CHAR_BIT)
{
result_p[pos/CHAR_BIT] |= 1 << (pos % CHAR_BIT);
}
else
{
abort();
}
}
pos += 1;
}
}
return result;
}
Although I'm not sure whether that formulation actually allows access to the contents of the mask at compile time. But in any case, it's available for use, maybe a constexpr function or something would be a better idea. I'm not looking here for the necessary C++ wizardry (I'll figure that out), just the algorithm.
An example of input/output, with 16-bit values and imaginary binary notation for clarity:
mask = 0b0011011100100110
val = 0b0101000101110011
--
wanted = 0b__01_001__1__01_ // retain only those bits which are set in the mask
result = 0b0000000001001101 // bring them to the front
^ gathered bits begin here
My questions are:
What's the most performant way to do this? (Are there any hardware instructions that can help?)
What if both the mask and the value are restricted to be unsigned, so a single word, instead of an unbounded char array? Can it then be done with a fixed, short sequence of instructions?
There will pext (parallel bit extract) that does exactly what you want in Intel Haswell. I don't know what the performance of that instruction will be, probably better than the alternatives though. This operation is also known as "compress-right" or simply "compress", the implementation from Hacker's Delight is this:
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
So I've got a custom randomizer class that uses a Mersenne Twister (the code I use is adapted from this site). All seemed to be working well, until I started testing different seeds (I normally use 42 as a seed, to ensure that each time I run my program, the results are the same, so I can see how code changes influence things).
It turns out that, no matter what seed I choose, the code produces the exact same series of numbers each time. Clearly I'm doing something wrong, but I don't know what. Here is my seed function:
void Randomizer::Seed(unsigned long int Seed)
{
int ii;
x[0] = Seed & 0xffffffffUL;
for (ii = 0; ii < N; ii++)
{
x[ii] = (1812433253UL * (x[ii - 1] ^ (x[ii - 1] >> 30)) + ii);
x[ii] &= 0xffffffffUL;
}
}
And this is my Rand() function
unsigned long int Randomizer::Rand()
{
unsigned long int Result;
unsigned long int a;
int ii;
// Refill x if exhausted
if (Next == N)
{
Next = 0;
for (ii = 0; ii < N - 1; ii++)
{
Result = (x[ii] & U) | x[ii + 1] & L;
a = (Result & 0x1UL) ? A : 0x0UL;
x[ii] = x[( ii + M) % N] ^ (Result >> 1) ^ a;
}
Result = (x[N - 1] & U) | x[0] & L;
a = (Result & 0x1UL) ? A : 0x0UL;
x[N - 1] = x[M - 1] ^ (Result >> 1) ^ a;
}
Result = x[Next++];
//Improves distribution
Result ^= (Result >> 11);
Result ^= (Result << 7) & 0x9d2c5680UL;
Result ^= (Result << 15) & 0xefc60000UL;
Result ^= (Result >> 18);
return Result;
}
The various values are:
#define A 0x9908b0dfUL
#define U 0x80000000UL
#define L 0x7fffffffUL
int Randomizer::N = 624;
int Randomizer::M = 397;
int Randomizer::Next = 0;
unsigned long Randomizer::x[624];
Can anyone help me figure out why different seeds don't result in different sequences of numbers?
Your Seed() function assigns to x[0], then starts looping at ii=0, which overwrites x[0] with an undefined value (it references x[-1]). Start your loop at 1, and you'll probably be all set.
Writing your own randomizer is dangerous. Why? It's hard to get right (see above), it's hard to know if you've done it right, and if it's wrong, things that rely on correctly distributed random numbers will not work quite right. Hopefully that thing is not cryptography or statistical modeling where the tails matter.... Think about using std::random, or if you're not on C++11 yet, boost::random.
How for given unsigned integer x find the smallest n, that 2 ^ n ≥ x in O(1)? in other words I want to find the index of higher set bit in binary format of x (plus 1 if x is not power of 2) in O(1) (not depended on size of integer and size of byte).
If you have no memory constraints, then you can use a lookup table (one entry for each possible value of x) to achieve O(1) time.
If you want a practical solution, most processors will have some kind of "find highest bit set" opcode. On x86, for instance, it's BSR. Most compilers will have a mechanism to write raw assembler.
Ok, since so far nobody has posted a compile-time solution, here's mine. The precondition is that your input value is a compile-time constant. If you have that, it's all done at compile-time.
#include <iostream>
#include <iomanip>
// This should really come from a template meta lib, no need to reinvent it here,
// but I wanted this to compile as is.
namespace templ_meta {
// A run-of-the-mill compile-time if.
template<bool Cond, typename T, typename E> struct if_;
template< typename T, typename E> struct if_<true , T, E> {typedef T result_t;};
template< typename T, typename E> struct if_<false, T, E> {typedef E result_t;};
// This so we can use a compile-time if tailored for types, rather than integers.
template<int I>
struct int2type {
static const int result = I;
};
}
// This does the actual work.
template< int I, unsigned int Idx = 0>
struct index_of_high_bit {
static const unsigned int result =
templ_meta::if_< I==0
, templ_meta::int2type<Idx>
, index_of_high_bit<(I>>1),Idx+1>
>::result_t::result;
};
// just some testing
namespace {
template< int I >
void test()
{
const unsigned int result = index_of_high_bit<I>::result;
std::cout << std::setfill('0')
<< std::hex << std::setw(2) << std::uppercase << I << ": "
<< std::dec << std::setw(2) << result
<< '\n';
}
}
int main()
{
test<0>();
test<1>();
test<2>();
test<3>();
test<4>();
test<5>();
test<7>();
test<8>();
test<9>();
test<14>();
test<15>();
test<16>();
test<42>();
return 0;
}
'twas fun to do that.
In <cmath> there are logarithm functions that will perform this computation for you.
ceil(log(x) / log(2));
Some math to transform the expression:
int n = ceil(log(x)/log(2));
This is obviously O(1).
It's a question about finding the highest bit set (as lshtar and Oli Charlesworth pointed out). Bit Twiddling Hacks gives a solution which takes about 7 operations for 32 Bit Integers and about 9 operations for 64 Bit Integers.
You can use precalculated tables.
If your number is in [0,255] interval, simple table look up will work.
If it's bigger, then you may split it by bytes and check them from high to low.
Perhaps this link will help.
Warning : the code is not exactly straightforward and seems rather unmaintainable.
uint64_t v; // Input value to find position with rank r.
unsigned int r; // Input: bit's desired rank [1-64].
unsigned int s; // Output: Resulting position of bit with rank r [1-64]
uint64_t a, b, c, d; // Intermediate temporaries for bit count.
unsigned int t; // Bit count temporary.
// Do a normal parallel bit count for a 64-bit integer,
// but store all intermediate steps.
// a = (v & 0x5555...) + ((v >> 1) & 0x5555...);
a = v - ((v >> 1) & ~0UL/3);
// b = (a & 0x3333...) + ((a >> 2) & 0x3333...);
b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5);
// c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...);
c = (b + (b >> 4)) & ~0UL/0x11;
// d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...);
d = (c + (c >> 8)) & ~0UL/0x101;
t = (d >> 32) + (d >> 48);
// Now do branchless select!
s = 64;
// if (r > t) {s -= 32; r -= t;}
s -= ((t - r) & 256) >> 3; r -= (t & ((t - r) >> 8));
t = (d >> (s - 16)) & 0xff;
// if (r > t) {s -= 16; r -= t;}
s -= ((t - r) & 256) >> 4; r -= (t & ((t - r) >> 8));
t = (c >> (s - 8)) & 0xf;
// if (r > t) {s -= 8; r -= t;}
s -= ((t - r) & 256) >> 5; r -= (t & ((t - r) >> 8));
t = (b >> (s - 4)) & 0x7;
// if (r > t) {s -= 4; r -= t;}
s -= ((t - r) & 256) >> 6; r -= (t & ((t - r) >> 8));
t = (a >> (s - 2)) & 0x3;
// if (r > t) {s -= 2; r -= t;}
s -= ((t - r) & 256) >> 7; r -= (t & ((t - r) >> 8));
t = (v >> (s - 1)) & 0x1;
// if (r > t) s--;
s -= ((t - r) & 256) >> 8;
s = 65 - s;
As has been mentioned, the length of the binary representation of x + 1 is the n you're looking for (unless x is in itself a power of two, meaning 10.....0 in a binary representation).
I seriously doubt there exists a true solution in O(1), unless you consider translations to binary representation to be O(1).
For a 32 bit int, the following pseudocode will be O(1).
highestBit(x)
bit = 1
highest = 0
for i 1 to 32
if x & bit == 1
highest = i
bit = bit * 2
return highest + 1
It doesn't matter how big x is, it always checks all 32 bits. Thus constant time.
If the input can be any integer size, say the input is n digits long. Then any solution reading the input, will read n digits and must be at least O(n). Unless someone comes up solution without reading the input, it is impossible to find a O(1) solution.
After some search in internet I found this 2 versions for 32 bit unsigned integer number. I have tested them and they work. It is clear for me why second one works, but still now I'm thinking about first one...
1.
unsigned int RoundUpToNextPowOf2(unsigned int v)
{
unsigned int r = 1;
if (v > 1)
{
float f = (float)v;
unsigned int const t = 1U << ((*(unsigned int *)&f >> 23) - 0x7f);
r = t << (t < v);
}
return r;
}
2.
unsigned int RoundUpToNextPowOf2(unsigned int v)
{
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
return v;
}
edit: First one in clear as well.
An interesting question. What do you mean by not depending on the size
of int or the number of bits in a byte? To encounter a different number
of bits in a byte, you'll have to use a different machine, with
a different set of machine instructions, which may or may not affect the
answer.
Anyway, based sort of vaguely on the first solution proposed by Mihran,
I get:
int
topBit( unsigned x )
{
int r = 1;
if ( x > 1 ) {
if ( frexp( static_cast<double>( x ), &r ) != 0.5 ) {
++ r;
}
}
return r - 1;
}
This works within the constraint that the input value must be exactly
representable in a double; if the input is unsigned long long, this
might not be the case, and on some of the more exotic platforms, it
might not even be the case for unsigned.
The only other constant time (with respect to the number of bits) I can
think of is:
int
topBit( unsigned x )
{
return x == 0 ? 0.0 : ceil( log2( static_cast<double>( x ) ) );
}
, which has the same constraint with regards to x being exactly
representable in a double, and may also suffer from rounding errors
inherent in the floating point operations (although if log2 is
implemented correctly, I don't think that this should be the case). If
your compiler doesn't support log2 (a C++11 feature, but also present
in C90, so I would expect most compilers to already have implemented
it), then of course, log( x ) / log( 2 ) could be used, but I suspect
that this will increase the risk of a rounding error resulting in
a wrong result.
FWIW, I find the O(1) on the number of bits a bit illogical, for the
reasons I specified above: the number of bits is just one of the many
"constant factors" which depend on the machine on which you run.
Anyway, I came up with the following purely integer solution, which is
O(lg 1) for the number of bits, and O(1) for everything else:
template< int k >
struct TopBitImpl
{
static int const k2 = k / 2;
static unsigned const m = ~0U << k2;
int operator()( unsigned x ) const
{
unsigned r = ((x & m) != 0) ? k2 : 0;
return r + TopBitImpl<k2>()(r == 0 ? x : x >> k2);
}
};
template<>
struct TopBitImpl<1>
{
int operator()( unsigned x ) const
{
return 0;
}
};
int
topBit( unsigned x )
{
return TopBitImpl<std::numeric_limits<unsigned>::digits>()(x)
+ (((x & (x - 1)) != 0) ? 1 : 0);
}
A good compiler should be able to inline the recursive calls, resulting
in close to optimal code.