Hash functions and random permutation - c++

After reading this question. I was wondering is it possible using O(1) space can we generate a random permutation of the sequence [1...n] with a uniform distribution using something like double hashing?
I tried this with a small example for the sequence [1,2,3,4,5] and it works. But it fails for scale for larger sets.
int h1(int k) {
return 5 - (k % 7);
}
int h2(int k) {
return (k % 3) + 1;
}
int hash(int k, int i) {
return (h1(k) + i*h2(k)) % size;
}
int main() {
for(int k = 0; k < 10; k++) {
std::cout << "k=" << k << std::endl;
for(int i = 0; i < 5; i++) {
int q = hash(k, i);
if(q < 0) q += 5;
std::cout << q;
}
std::cout << std::endl;
}
}

You can try another approach.
Take arbitrary integer number P that GCD(P, N) == 1 where GCD(P,
N) is greatest common divisor of P and N (e.g. GCD(70, 42) == 14,
GCD(24, 35) == 1).
Get sequence K[i] ::= (P * i) mod N + 1, i from 1 to N
It's proven that sequence K[i] enumerates all numbers between
1 and N with no repeats (actually K[N + 1] == K[1] but that is not a problem because we need only first N numbers).
If you can efficiently generate such numbers P with uniform distribution (e.g. with a good random function) with using Euclidean algorithm to calculate GCD in O(log(N)) complexity you'll get what you want.

It is not possible to generate a "random" permutation without some randomness. It doesn't even make sense. Your code will generate the same permutation every time.
I suspect you intend that you pick a different two random hash functions every time. But even that won't work using hash functions like you have (a +/- k%b for a,b chosen at random), as you need O(n log n) bits of randomness to specify a permutation.

I'm not sure what the question is. If you want a random permutation,
you want a random number generator, not a hash function. A hash
function is (and must be) deterministic, so it cannot be used for a
"random" permutation. And a hash is not a permutation of anything.
I don't think that a random permutation can be O(1) space. You've got
to keep track somehow of the elements which have already been used.

Related

Need optimization tips for a subset sum like problem with a big constraint

Given a number 1 <= N <= 3*10^5, count all subsets in the set {1, 2, ..., N-1} that sum up to N. This is essentially a modified version of the subset sum problem, but with a modification that the sum and number of elements are the same, and that the set/array increases linearly by 1 to N-1.
I think i have solved this using dp ordered map and inclusion/exclusion recursive algorithm, but due to the time and space complexity i can't compute more than 10000 elements.
#include <iostream>
#include <chrono>
#include <map>
#include "bigint.h"
using namespace std;
//2d hashmap to store values from recursion; keys- i & sum; value- count
map<pair<int, int>, bigint> hmap;
bigint counter(int n, int i, int sum){
//end case
if(i == 0){
if(sum == 0){
return 1;
}
return 0;
}
//alternative end case if its sum is zero before it has finished iterating through all of the possible combinations
if(sum == 0){
return 1;
}
//case if the result of the recursion is already in the hashmap
if(hmap.find(make_pair(i, sum)) != hmap.end()){
return hmap[make_pair(i, sum)];
}
//only proceed further recursion if resulting sum wouldnt be negative
if(sum - i < 0){
//optimization that skips unecessary recursive branches
return hmap[make_pair(i, sum)] = counter(n, sum, sum);
}
else{
//include the number dont include the number
return hmap[make_pair(i, sum)] = counter(n, i - 1, sum - i) + counter(n, i - 1, sum);
}
}
The function has starting values of N, N-1, and N, indicating number of elements, iterator(which decrements) and the sum of the recursive branch(which decreases with every included value).
This is the code that calculates the number of the subsets. for input of 3000 it takes around ~22 seconds to output the result which is 40 digits long. Because of the long digits i had to use an arbitrary precision library bigint from rgroshanrg, which works fine for values less than ~10000. Testing beyond that gives me a segfault on line 28-29, maybe due to the stored arbitrary precision values becoming too big and conflicting in the map. I need to somehow up this code so it can work with values beyond 10000 but i am stumped with it. Any ideas or should i switch towards another algorithm and data storage?
Here is a different algorithm, described in a paper by Evangelos Georgiadis, "Computing Partition Numbers q(n)":
std::vector<BigInt> RestrictedPartitionNumbers(int n)
{
std::vector<BigInt> q(n, 0);
// initialize q with A010815
for (int i = 0; ; i++)
{
int n0 = i * (3 * i - 1) >> 1;
if (n0 >= q.size())
break;
q[n0] = 1 - 2 * (i & 1);
int n1 = i * (3 * i + 1) >> 1;
if (n1 < q.size())
q[n1] = 1 - 2 * (i & 1);
}
// construct A000009 as per "Evangelos Georgiadis, Computing Partition Numbers q(n)"
for (size_t k = 0; k < q.size(); k++)
{
size_t j = 1;
size_t m = k + 1;
while (m < q.size())
{
if ((j & 1) != 0)
q[m] += q[k] << 1;
else
q[m] -= q[k] << 1;
j++;
m = k + j * j;
}
}
return q;
}
It's not the fastest algorithm out there, and this took about half a minute for on my computer for n = 300000. But you only need to do it once (since it computes all partition numbers up to some bound) and it doesn't take a lot of memory (a bit over 150MB).
The results go up to but excluding n, and they assume that for each number, that number itself is allowed to be a partition of itself eg the set {4} is a partition of the number 4, in your definition of the problem you excluded that case so you need to subtract 1 from the result.
Maybe there's a nicer way to express A010815, that part of the code isn't slow though, I just think it looks bad.

How do i generate a random number between 1 and k which is not equal to n and m in c++?

if n=2,m=3 and k=5 then answer could be 1 or 4 or 5. given k is always greater than or equal to 3. if k=3 and n=1,m=3 then answer will be 2.
Assuming m≠n, there are four cases to consider:
(m < 1 or m > k) and (n < 1 or n > k):
Just return a random number from 1 to k.
(1 ≤ m ≤ k) and (n < 1 or n > k):
Generate a random number from 1 to k–1. If it is equal to m, output k instead.
(m < 1 or m > k) and (1 ≤ n ≤ k):
Generate a random number from 1 to k–1. If it is equal to n, output k instead.
(1 ≤ m ≤ k) and (1 ≤ n ≤ k):
Generate a random number from 1 to k–2. If it is equal to m, output k–1 instead, and if it is equal to n, output k instead.
If m=n, you could just set one of them to zero.
without considering consumed time for long range this can counted as an answer( but non optimized one)
while( true)
{
random = ( rand() % ( k + 1));
if( random != n && random != m)
{
break;
}
}
In C++ rand() creates a psudo random collection of 'random' numbers. So I recomend to set a seed value with srand(...). For another solution:
int main()
{
// Generetes random seed for psudo random numbers.
srand (1234);
// Initialze variables
int n, m, k, random;
// Read in values
std::cout << "Enter n: ";
cin >> n;
std::cout << "Enter m: ";
cin >> m;
std::cout << "Enter k: ";
cin >> k;
// Generates a random number
do {
random = rand()% k + 1; // Random number from 1 to k
}
// Repeat if the random number is equal to n or m
while (random == n || random == m);
// The value
cout<<"Number generated: " <<random<<"\n";
return 0;
}
There are lots of ways to do it, and the choice depends on what the underlying requirements are.
// brute force (assumes that n < m):
int res = rand() % (k - 2) + 1;
if (n <= res) ++res;
if (m <= res) ++res;
// elimination:
int res = rand() % k + 1;
while (res == n || res == m)
res = rand() % k;
// table lookup:
int results[] = { 1, 2, 3, 5, 6, 8 }; // n == 4, m == 7
int res = rand() % (sizeof results / sizeof *results);
res = results[res];
I'd probably go with the brute force approach; it always works, provided you know the relative order of n and m. A more sophisticated version would check which one is smaller, and swap them if necessary so that n is less than m.
Elimination is also always correct, and when k is large the loop will rarely be executed, so it might be slightly faster than brute force. When k is small it could loop many times. This approach is sometimes used in generating more complicated distributions such as a pair of coordinates that are within a circle (generate two coordinates and if they're outside the circle throw them away and try again).
Table lookup is probably not a good choice, but if you know the values of k, n, and m at compile time it could be slightly faster than either of the other two. Of course, for large values of k there's a lot of wasted space.

count number of partitions of a set with n elements into k subsets

This program is for count number of partitions of a set with n elements into k subsets I am confusing here return k*countP(n-1, k) + countP(n-1, k-1);
can some one explain what is happening here?
why we are multiplying with k?
NOTE->I know this is not the best way to calculate number of partitions that would be DP
// A C++ program to count number of partitions
// of a set with n elements into k subsets
#include<iostream>
using namespace std;
// Returns count of different partitions of n
// elements in k subsets
int countP(int n, int k)
{
// Base cases
if (n == 0 || k == 0 || k > n)
return 0;
if (k == 1 || k == n)
return 1;
// S(n+1, k) = k*S(n, k) + S(n, k-1)
return k*countP(n-1, k) + countP(n-1, k-1);
}
// Driver program
int main()
{
cout << countP(3, 2);
return 0;
}
Each countP call implicitly considers a single element in the set, lets call it A.
The countP(n-1, k-1) term comes from the case where A is in a set by itself. In this case, we just have to count how many ways there are to partition all the other elements (N-1) into (K-1) subsets, since A takes up one subset by itself.
The k*countP(n-1, k) term, then, comes from the case where A is not in a set by itself. So we figure out the number of ways of partitioning all the other (N-1) values into K subsets, and multiply by K because there are K possible subsets we could add A to.
For example, consider the set [A,B,C,D], with K=2.
The first case, countP(n-1, k-1), describes the following situation:
{A, BCD}
The second case, k*countP(n-1, k), describes the following cases:
2*({BC,D}, {BD,C}, {B,CD})
Or:
{ABC,D}, {ABD,C}, {AB,CD}, {BC,AD}, {BD,AC}, {B,ACD}
How do we get countP(n,k)? Assuming that we have devided previous n-1 element into a certain number of partions, and now we have the n-th element, and we try to make k partition.
we have two option for this:
either
we have devided the previous n-1 elements into k partions(we have countP(n-1, k) ways of doing this), and we put this n-th element into one of these partions(we have k choices). So we have k*countP(n-1, k).
or:
we divide previous n-1 elements into k-1 partition(we have countP(n-1, k-1); ways of doing this), and we make the n-th element a single partion to achieve a k partition(we only have 1 choice: putting it seperately). So we have countP(n-1, k-1);.
So we sum them up and get the result.
What you mentioned is the Stirling numbers of the second kind which enumerates the number of ways to partition a set of n objects into k non-empty subsets and denoted by or .
Its recursive relation is:
for k > 0 with initial conditions:
.
Calculating it using dynamic programming is more faster than recursive approach:
int secondKindStirlingNumber(int n, int k) {
int sf[n + 1][n + 1];
for (int i = 0; i < k; i++) {
sf[i][i] = 1;
}
for (int i = 1; i < n + 1; i++) {
for (int j = 1; j < k + 1; j++) {
sf[i][j] = j * sf[i - 1][j] + sf[i - 1][j - 1];
}
}
return sf[n][k];
}
Based on This a partition of a set is a grouping of the set's elements into non-empty subsets, in such a way that every element is included in one and only one of the subsets. So the total number of partitions of an n-element set is the Bell number which is calculated like below:
Bell number formula
Hence if you want to convert the formula to a recursive function it will be like:
k*countP(n-1,k) + countP(n-1, k-1);

Given number N eliminate K digits to get maximum possible number

As the title says, the task is:
Given number N eliminate K digits to get maximum possible number. The digits must remain at their positions.
Example: n = 12345, k = 3, max = 45 (first three digits eliminated and digits mustn't be moved to another position).
Any idea how to solve this?
(It's not homework, I am preparing for an algorithm contest and solve problems on online judges.)
1 <= N <= 2^60, 1 <= K <= 20.
Edit: Here is my solution. It's working :)
#include <iostream>
#include <string>
#include <queue>
#include <vector>
#include <iomanip>
#include <algorithm>
#include <cmath>
using namespace std;
int main()
{
string n;
int k;
cin >> n >> k;
int b = n.size() - k - 1;
int c = n.size() - b;
int ind = 0;
vector<char> res;
char max = n.at(0);
for (int i=0; i<n.size() && res.size() < n.size()-k; i++) {
max = n.at(i);
ind = i;
for (int j=i; j<i+c; j++) {
if (n.at(j) > max) {
max = n.at(j);
ind = j;
}
}
b--;
c = n.size() - 1 - ind - b;
res.push_back(max);
i = ind;
}
for (int i=0; i<res.size(); i++)
cout << res.at(i);
cout << endl;
return 0;
}
Brute force should be fast enough for your restrictions: n will have max 19 digits. Generate all positive integers with numDigits(n) bits. If k bits are set, then remove the digits at positions corresponding to the set bits. Compare the result with the global optimum and update if needed.
Complexity: O(2^log n * log n). While this may seem like a lot and the same thing as O(n) asymptotically, it's going to be much faster in practice, because the logarithm in O(2^log n * log n) is a base 10 logarithm, which will give a much smaller value (1 + log base 10 of n gives you the number of digits of n).
You can avoid the log n factor by generating combinations of n taken n - k at a time and building the number made up of the chosen n - k positions as you generate each combination (pass it as a parameter). This basically means you solve the similar problem: given n, pick n - k digits in order such that the resulting number is maximum).
Note: there is a method to solve this that does not involve brute force, but I wanted to show the OP this solution as well, since he asked how it could be brute forced in the comments. For the optimal method, investigate what would happen if we built our number digit by digit from left to right, and, for each digit d, we would remove all currently selected digits that are smaller than it. When can we remove them and when can't we?
In the leftmost k+1 digits, find the largest one (let us say it is located at ith location. In case there are multiple occurrences choose the leftmost one). Keep it. Repeat the algorithm for k_new = k-i+1, newNumber = i+1 to n digits of the original number.
Eg. k=5 and number = 7454982641
First k+1 digits: 745498
Best number is 9 and it is located at location i=5.
new_k=1, new number = 82641
First k+1 digits: 82
Best number is 8 and it is located at i=1.
new_k=1, new number = 2641
First k+1 digits: 26
Best number is 6 and it is located at i=2
new_k=0, new number = 41
Answer: 98641
Complexity is O(n) where n is the size of the input number.
Edit: As iVlad mentioned, in the worst case complexity can be quadratic. You can avoid that by maintaining a heap of size at most k+1 which will increase complexity to O(nlogk).
Following may help:
void removeNumb(std::vector<int>& v, int k)
{
if (k == 0) { return; }
if (k >= v.size()) {
v.clear();
return;
}
for (int i = 0; i != v.size() - 1; )
{
if (v[i] < v[i + 1]) {
v.erase(v.begin() + i);
if (--k == 0) { return; }
i = std::max(i - 1, 0);
} else {
++i;
}
}
v.resize(v.size() - k);
}

How to reduce complexity of this code

Please can any one provide with a better algorithm then trying all the combinations for this problem.
Given an array A of N numbers, find the number of distinct pairs (i,
j) such that j >=i and A[i] = A[j].
First line of the input contains number of test cases T. Each test
case has two lines, first line is the number N, followed by a line
consisting of N integers which are the elements of array A.
For each test case print the number of distinct pairs.
Constraints:
1 <= T <= 10
1 <= N <= 10^6
-10^6 <= A[i] <= 10^6 for 0 <= i < N
I think that first sorting the array then finding frequency of every distinct integer and then adding nC2 of all the frequencies plus adding the length of the string at last. But unfortunately it gives wrong ans for some cases which are not known help. here is the implementation.
code:
#include <iostream>
#include<cstdio>
#include<algorithm>
using namespace std;
long fun(long a) //to find the aC2 for given a
{
if (a == 1) return 0;
return (a * (a - 1)) / 2;
}
int main()
{
long t, i, j, n, tmp = 0;
long long count;
long ar[1000000];
cin >> t;
while (t--)
{
cin >> n;
for (i = 0; i < n; i++)
{
cin >> ar[i];
}
count = 0;
sort(ar, ar + n);
for (i = 0; i < n - 1; i++)
{
if (ar[i] == ar[i + 1])
{
tmp++;
}
else
{
count += fun(tmp + 1);
tmp = 0;
}
}
if (tmp != 0)
{
count += fun(tmp + 1);
}
cout << count + n << "\n";
}
return 0;
}
Keep a count of how many times each number appears in an array. Then iterate over the result array and add the triangular number for each.
For example(from the source test case):
Input:
3
1 2 1
count array = {0, 2, 1} // no zeroes, two ones, one two
pairs = triangle(0) + triangle(2) + triangle(1)
pairs = 0 + 3 + 1
pairs = 4
Triangle numbers can be computed by (n * n + n) / 2, and the whole thing is O(n).
Edit:
First, there's no need to sort if you're counting frequency. I see what you did with sorting, but if you just keep a separate array of frequencies, it's easier. It takes more space, but since the elements and array length are both restrained to < 10^6, the max you'll need is an int[10^6]. This easily fits in the 256MB space requirements given in the challenge. (whoops, since elements can go negative, you'll need an array twice that size. still well under the limit, though)
For the n choose 2 part, the part you had wrong is that it's an n+1 choose 2 problem. Since you can pair each one by itself, you have to add one to n. I know you were adding n at the end, but it's not the same. The difference between tri(n) and tri(n+1) is not one, but n.