Good Hash function with 2 integer for a special key - c++

I'm trying to determine a key for map<double, double> type. But the problem is that the key I want will be generated by a pair of 2 numbers. Are there any good functions which could generate such key for pairs like (0, 1), (2, 3), (4, 2) (0, 2), etc.

Go for N'ary numerical system, where N is the maximum possible value of the number in pair.
Like this:
hash(a, b) = a + b * N
then
a = hash(a, b) % N
b = hash(a, b) / N
This will guarantee that for every pair (a, b) there is its own unique hash(a, b). Same things happens to numbers in decimal: imagine all numbers from 0 (we write them as 00, 01, 02, ...) to 99 inclusive are your pairs ab. Then, hash(a, b) = a * 10 + b, and visa-versa, to obtain first digit you have to divide the number by 10, second - get it modulo 10.
Why can't we pick any N, maybe smaller than the maximum of a/b? The answer is: to avoid collision.
If you pick any number and it happens to be smaller than your maximum number, it is highly possible that same hash function will be provided by different pairs of numbers. For example, if you pick N = 10 for pairs: (10, 10) and (0, 11), both their hashes will be equal to 110, which is not good for you in this situation.

You should ideally have a KeyValuePair<int, int> as your key. I don't think writing more code than that can be helpful. If you cant have that for some reason, then hashing the pair to give a single key depends on what you're trying to achieve. If hashes are meant for hash structures like Dictionary, then you have to balance collision rate and speed of hashing. To have a perfect hash without collision at all it will be more time consuming. Similarly the fastest hashing algorithm will have more collisions relatively. Finding the perfect balance is the key here. Also you should take into consideration how large your effective hash can be and if hashed output should be reversible to give you back the original inputs. Typically priority should be given to speed up pairing/hashing/mapping than minimizing collision probability (a good hash algorithm will have less collision chances). To have perfect hashes you can see this thread for a plethora of options..

Related

finding intersections in a given range?

assume array of N (N<=100000) elements a1, a2, .... ,an, and you are given range in it L, R where 1<=L<=R<=N, you are required to get number of values in the given range which are divisible by at least one number from a set S which is given also, this set can be any subset of {1,2,....,10}. a fast way must be used because it may ask you for more than one range and more than one S (many queries Q, Q<=100000), so looping on the values each time will be very slow.
i thought of storing numbers of values divisible by each number in the big set {1,2,....,10} in 10 arrays of N elements each, and do cumulative sum to get the number of values divisible by any specific number in any range in O(1) time, for example if it requires to get number of values divisible by at least one of the following: 2,3,5, then i add the numbers of values divisible by each of them and then remove the intersections, but i didn't properly figure out how to calculate the intersections without 2^10 or 2^9 calculations each time which will be also very slow (and possibly hugely memory consuming) because it may be done 100000 times, any ideas ?
Your idea is correct. You can use inclusion-exclusion principle and prefix sums to find the answer. There is just one more observation you need to make.
If there's a pair of numbers a and b in the set such that a divides b, we can remove b without changing the answer to the query (indeed, if b | x, then a | x). Thus, we always get a set such that no element divides any other one.
The number of such mask is smaller than 2^10. In facts, it's 102. Here's the code that computes it:
def good(mask):
for i in filter(lambda b: mask & (1 << (b - 1)), range(1, 11)):
if (any(i % j == 0 for j in filter(lambda b: mask & (1 << (b - 1)), range(1, i)))):
return False
return True
print(list(filter(good, range(1, 2 ** 10)))))
Thus, we the preprocessing requires approximately 100N operations and numbers to store (it looks reasonably small).
Moreover, there are most 5 elements in any "good" mask (it can be checked using the code above). Thus, we can answer each query using around 2^5 operations.

Best way to store multi variable polynomials in Lisp

I need to store polynomials in my lisp program for adding, subtracting and multiplying. But cannot find an easy way of storing one.
I've considered the following way
(2x^3 + 2x + 4y^3 - 2z) in a list of lists where each list is a list of the amount of each power
= ( (0 2 0 2) (0 0 0 4) (0 2) )
But the uncertain lengths of each list and potential length could become a problem.
Is there a generally accepted way to store them in lisp which could make it as easy as possible to add, subtract and multiply them together?
Assuming you know the number of possible variables beforehand, you could express each term like this: (constant x-exponent y-exponent z-exponent ...). Then 5xy^2 would be (5 1 2 0), and a full expression would just be a list of those terms.
If you want to be able to handle any number of arbitrary variables, you would need to do an associative list along the lines of ((constant 5) (a 0) (b 3) (z 23) (apple 13)).
Either way, if you start with individual terms, it's easy to build more complex expressions and this way you don't need to mess with multiple dimensions.
May be this idea will help you partly. You can represent polynomial as vector, when index will be a power and an element - a coefficient, and first element - your variable. I mean 5*x^3 + 10*x^2 + 40x + 50 will look like #(50 40 10 5). Working with such representation easy, but it looks like not very optimal for big powers like x^100.
Multivariable polynomial may be represented as N-dimensional array where N - number of variables.
There are several ways of representing polynomials. As usual the choice of representation is a tradeoff.
One way is an association list from order to coefficient usually sorted
after according to order.
12x^2 + 11x + 10 ((2 . 12) (11 . 1) (10 . 0))
If you need to compute with sparse polynomials, then this representation is space efficient. x^200 is just ((200 . 1)).
If your calculations consists mostly of non-sparse polynomals a vector representation is more space efficient:
12x^2 11x + 10 (vector 10 11 12)
The length of the vector minus one gives the order of the polynomial.
If you need polynomials of more than one variable there are variations of the representations. In particular you can look a the representation in Maxima:
http://maxima.sourceforge.net/docs/manual/maxima_14.html
If you happen to have "Paradigms of Artificial Intelligence Programming: Case Studies in Common LISP" by Peter Norvig, there is a nice chapter on polynomials.

How can I find products with a given remainder efficiently in C++?

If I am given a number like 23,128,765 and I am given two more numbers 9 and 3, I want to calculate the number of pairs such that for 9 and 3 the substrings whose remainder is 3 when divided by 9 are: 3, 31,287, 12, and 876.
How can I calculate the number of such substrings using C++?
One possible way is to calculate all possible substrings but that is O(n^2), but I want something faster than that.
You asked for C++, but here's a solution in Common Lisp, which was easier to write and demonstrate in this context:
(defun quotients-with-remainder-in (n divisor remainder)
(loop with str = (format nil "~D" n)
for i upfrom remainder to n by divisor
when (search (format nil "~D" i) str)
collect i))
> (quotients-with-remainder-in 23128765 9 3)
(3 12 876 31287)
This strides upward by the divisor, and converts each potential product into a string a searches for that string within the upper bound number's string representation.
That creates many strings along the way, but finding a way to manipulate integers to see whether the decimal representation of one occurs within the decimal representation of another is not amenable to any arithmetic operations. For instance, figuring out whether the number 131 "contains" a 3 can't be accomplished with arithmetic and inspection of a two's complement representation.
Well, let's see; perhaps you could multiply the candidate number by factors of 10, and for each such product subtract it from the original number, and then check whether the important digits in the difference are all zero, for which you can use the modulus operator against a power of 10 one greater than the base-10 logarithm of the product. I started writing such a solution, an it quickly went off the rails. Using the extra logarithms, multiplication, and division will likely yield a solution even slower than the string conversion and substring searching.

All subsets in Subset_sum_problem

I'm stuck at solving Subset_sum_problem.
Given a set of integers(S), need to compute non-empty subsets whose sum is equal to a given target(T).
Example:
Given set, S{4, 8, 10, 16, 20, 22}
Target, T = 52.
Constraints:
The number of elements N of set S is limited to 8. Hence a NP time solution is acceptable as N has a small upperbound.
Time and space complexities are not really a concern.
Output:
Possible subsets with sum exactly equal to T=52 are:
{10, 20, 22}
{4, 10, 16, 22}
The solution given in Wiki and in some other pages tries to check whether there exists such a subset or not (YES/NO).
It doesn't really help to compute all possible subsets as outlined in the above example.
The dynamic programming approach at this link gives single such subset but I need all such subsets.
One obvious approach is to compute all 2^N combinations using brute force but that would be my last resort.
I'm looking for some programmatic example(preferably C++) or algorithm which computes such subsets with illutrations/examples?
When you construct the dynamic-programming table for the subset sum problem you intialize most of it like so (taken from the Wikipedia article referenced in the question):
Q(i,s) := Q(i − 1,s) or (xi == s) or Q(i − 1,s − xi)
This sets the table element to 0 or 1.
This simple formula doesn't let you distinguish between those several cases that can give you 1.
But you can instead set the table element to a value that'd let you distinguish those cases, something like this:
Q(i,s) := {Q(i − 1,s) != 0} * 1 + {xi == s} * 2 + {Q(i − 1,s − xi) != 0} *4
Then you can traverse the table from the last element. At every element the element value will tell you whether you have zero, one or two possible paths from it and their directions. All paths will give you all combinations of numbers summing up to T. And that's at most 2N.
if N <= 8 why don't just go with 2^n solution?? it's only 256 possibilities that will be very fast
Just brute force it. If N is limited to 8, your total number of subsets is 2^8, which is only 256. They give constraints for a reason.
You can express the set inclusion as a binary string where each element is either in the set or out of the set. Then you can just increment your binary string (which can simply be represented as an integer) and then determine which elements are in the set or not using the bitwise & operator. Once you've counted up to 2^N, you know you've gone through all possible subsets.
The best way to do it is using a dynamic programming approach.However, dynamic programming just answers whether a subset sum exits or not as you mentioned in your question.
By dynamic programming, you can output all the solutions by backtracking.However, the overall time complexity to generate all the valid combinations is still 2^n.
So, any better algorithm than 2^n is close to impossible.
UPD:
From #Knoothe Comment:
You can modify horowitz-sahni's algorithm to enumerate all possible subsets.If there are M such sets whose sum equals S, then overall time complexity is in O(N * 2^(N/2) + MN)

Unbalanced random number generator

I have to pick an element from an ascending array. Smaller elements are considered better. So if I pick an element from the beginning of the array it's considered a better choice. But at the same time I don't want the choice to be deterministic and always the same element. So I'm looking for
a random numbers generator that produces numbers in range [0, n], but
the smaller the number is, the more chance of it being produced.
This came to my mind:
num = n;
while(/*the more iteration the more chance for smaller numbers*/)
num = rand()%num;
I was wondering if anyone had a better solution.
I did look at some similar questions but they have details about random number generation generally. I'm looking for a solution to this specific type of random number generation, either an algorithm or a library that provides it.
Generate a Random number, say x, between [0,n) and then generate another Random floating point number, say y, between [0,1]. Then raise x to the power of y and use floor function, you'll get your number.
int cust(int n)
{
int x;
double y, temp;
x = rand() % n;
y = (double)rand()/(double)RAND_MAX;
temp = pow((double) x, y);
temp = floor(temp);
return (int)temp;
}
Update: Here are some sample results of calling the above function 10 times, with n = 10, 20 and 30.
2 5 1 0 1 0 1 4 1 0
1 2 4 1 1 2 3 5 17 6
1 19 2 1 2 20 5 1 6 6
Simple ad-hoc approach that came to my mind is to use standard random generators, but duplicate indices. So in the array:
0, 0, 0, 1, 1, 2, 3
odds are good that smaller element will be taken.
I dont' know exactly what do you need. You can also define your own distribution or maybe use some random number generation libraries. But suggested approach is simple and easy to configure.
UPDATE2: You don't have to generate array explicitly. For array of size 1000, you can generate random number from interval: [0,1000000] and then configure your own distribution of selected values: say, intervals of length 1200 for smaller values (0-500) and intervals of length 800 for larger (500-1000). The main point that this way you can easily configure the probability and you don't have to re-implement random number generator.
Use an appropriate random distribution, e.g. the rounded results of an exponential distribution. Pick a distribution that fits your needs, document the distribution you used, and find a nice implementation. If code under the GNU Public License is an option, use the excellent GNU Scientific Library (GSL), or try Boost.Random.
Two tools will solve many random distribution needs
1) A uniform random number generator which you have
2) and a function which maps uniform values onto your target distribution
I've gotta head to the city now, but I'll make note to write up a couple of examples with a drawing when I get back.
There are some worthwhile methods and ideas discussed in this related question (more about generating a normal pseudo random number)