Most equivalent factors of a number - c++

Given a number 'n', which is a power-of-2, how can I efficiently find the 2 factors which are most equivalent to eachother? In other words, if I have a linear array and want to map it to 2D, how can I find the 2D dimensions that are the most equal (image dimensions most close to a square)?
Gotta be some kind of bitwise operation to make this fast, rather than looping over factors.

n is representable as 2^k (since you say it's a power of 2). If k is even, then n == 2^(k/2) * 2^(k/2) (e.g. 16==4*4). If k is odd, then the closest you can get is n == 2^((k-1)/2) * 2^((k+1)/2) (e.g. 8==2*4)

Related

Count Divisors of Product from L to R

I have been solving a problem but then got stuck upon its subpart which is as follows:
Given an array of N elements whose ith element is A[i] and we are given Q queries of the type [L,R].
For each query output the number of divisors of product from Lth element to Rth element.
More formally, for each query lets define P as P = A[L] * A[L+1] * A[L+2] * ...* A[R].
Output the number of divisors of P modulo 998244353.
Constraints : 1<= N,Q <= 100000, 1<= A[i] <= 1000000.
My Approach,
For each index i, I have defined a map< int, int > which stores the prime divisor and its count in the product up to [1, i].
I am extracting the prime divisors of a number in O(LogN) using Sieve.
Then for each query (lets say {L,R} ), I am iterating through the map of Lth element and subtracting the count of each each key from the map of Rth element.
And then I am answering the query using the result:
if N = a^p * b^q * c^r ...(a,b,c being primes)
the number of divisors = (p+1)(q+1)(r+1)..
The time complexity of above solution is O(ND + QD), where D = number of distinct prime numbers upto 1000000. In worst case D = 78498.
Is there more efficient solution than this?
There is a more efficient solution for this. But it is slightly complicated. Here are steps to get to the necessary data structure.
Define a data type prime_factor that is a struct that contains a prime and a count.
Define a data type prime_factorization that is a vector of the first data type in ascending size of the primes. This can store the factorization of a number.
Write a function that takes a number, and turns its prime factorization into a prime_factorization
Write a function that takes 2 prime_factorization vectors and merges them into the factorization of the product of the two.
For each number in your array, compute its prime factorization. That gets stored in an array.
For each pair in your array, compute the prime factorization of the product. We will only need half of them. So elements 0, 1 go into one factorization, 2, 3 into the next and so on.
Repeat step 6 O(log(N)) times. So you have a vector of the factorization of each number, pairs, fours, eights, and so on. This results in approximately 2N precomputed factorization vectors. Most vectors are small though a few can be up to O(D) in size (where D is the number of distinct primes). Most of the merges should be very, very fast.
And now you have all of your data prepared. It can't take more than O(log(N)) times the space that storing the prime factors required by itself. (Less than that normally, though, because repeats among the small primes get gathered together in one prime_factor.)
Any range is the union of at most O(log(N)) of these computed vectors. For example the range 10..25 can be broken up into 10..11, 12..15, 16..24, 25. Arrange these intervals from smallest to largest and merge them. Then compute your answer from the result.
An exact analysis is complicated. But I assure you that query time is bounded above by O(Q * D * log(N)) and normally is much less than that.
UPDATE:
How do you find those intervals?
The answer is that you need to identify the number divisible by the highest power of 2 in the range, and then fill out both sides from there. And you figure that out by dividing by 2 (rounding down) until the range is of length 1. Then multiply the top boundary by 2 to find that mid-point.
For example if your range was 35-53 you would start by dividing by 2 to get 35-53, 17-26, 8-13, 4-6, 2-3. That was 2^4 we divided by. our power of 2 mid-point is 3*2^4 = 48. Our intervals above that midpoint are then 48-52, 53-53. Our intervals below are 40-47, 36-39, 35-35. And each of them is of length a power of 2 and starts at a number divisible by that power of 2.

Subsequence having sum at most 'k'

Given a non decreasing array A of size n and an integer k, how to find a subsequence S of the array A with maximum possible sum of its elements, such that this sum is at most k. If there are multiple such subsequences, we are interested in finding only one.
For example, let the array be {1, 2, 2, 4} so, n = 4 and let k = 7. Then, the answer should be {1, 2, 4}.
Brute force approach takes approximately O(n(2^n-1)) but is there a more efficient solution to this problem?
In the general case the answer is no.
Just deciding if there is a solution where elements sum up to k is equivalent to the Subset Sum Problem and thus already NP-complete.
The Subset Sum Problem can be equivalently formulated as: given the integers or
natural numbers w_1,... ,w_n does any subset of them sum to precisely W
However, if either n or the number of bits P that it takes to represent the largest number w is small there might be more efficient solution (e.g., a pseudo-polynomial solution based on dynamic programming if P is small). Additionally, if all your numbers w are positive then it might also be possible to find a better solution.

Predict the required number of preallocated nodes in a kD-Tree

I'm implementing a dynamic kD-Tree in array representation (storing the nodes in std::vector) in breadth-first fashion. Each i-th non-leaf node have a left child at (i<<1)+1 and a right child at (i<<1)+2. It would support incremental insertion of points and collection of points.
However I'm facing problem determining the required number of possible nodes to incrementally preallocate space.
I've found a formula on the web, which seems to be wrong:
N = min(m − 1, 2n − ½m − 1),
where m is the smallest power of 2 greater than or equal to n, the
number of points.
My implementation of the formula is the following:
size_t required(size_t n)
{
size_t m = nextPowerOf2(n);
return min(m - 1, (n<<1) - (m>>1) - 1);
}
function nextPowerOf2 returns a power of 2 largest or equal to n
Any help would be appreciated.
Each node of a kd-tree divides the space into two spaces. Hence, the number of nodes in the kd-tree depends on how you perform this division:
1) If you divide them in the midpoint of the space (that is, if the space is from x1 to x2, you divide the space with the x3=(x1+x2)/2 line), then:
i) Each point will be allocated its own node, and
ii) Each intermediate node will be empty.
In this case, the number of nodes will depend on how large the coordinates of the points are. If the coordinates are bounded by |X|, then the total number of nodes in the kd-tree should be slightly less than log |X| * n (more precisely, around log |X| * n - n log n + 2n) in the worst case. To see this, consider the following way to add the points: you add multiple collections, each collection has two extremely nearby points located at random. For each pair of point, the tree will need to continuously divide the space log |X| times, and if log |X| is significantly larger than log n, creating log |X| intermediate nodes in the process.
2) If you divide them by using a point as a dividing line, then each node (including the intermediate nodes) will contain a point. Thus, the total number of nodes is simply n. However, note that using a point to divide the space may yield to a very bad performance if the points are not given in a random order (for example, if the points are given in an ascending order of X, the depth of the tree would be O(n). For comparison, the depth of the tree in (1) is at most O(log |X|) ).

Generating random integers with a difference constraint

I have the following problem:
Generate M uniformly random integers from the range 0-N, where N >> M, and where no pair has a difference less than K. where M >> K.
At the moment the best method I can think of is to maintain a sorted list, then determine the lower bound of the current generated integer and test it with the lower and upper elements, if it's ok to then insert the element in between. This is of complexity O(nlogn).
Would there happen to be a more efficient algorithm?
An example of the problem:
Generate 1000 uniformly random integers between zero and 100million where the difference between any two integers is no less than 1000
A comprehensive way to solve this would be to:
Determine all the combinations of n-choose-m that satisfy the constraint, lets called it set X
Select a uniformly random integer i in the range [0,|X|).
Select the i'th combination from X as the result.
This solution is problematic when the n-choose-m is large, as enumerating and storing all possible combinations will be extremely costly. Hence an efficient online generating solution is sought.
Note: The following is a C++ implementation of the solution provided by pentadecagon
std::vector<int> generate_random(const int n, const int m, const int k)
{
if ((n < m) || (m < k))
return std::vector<int>();
std::random_device source;
std::mt19937 generator(source());
std::uniform_int_distribution<> distribution(0, n - (m - 1) * k);
std::vector<int> result_list;
result_list.reserve(m);
for (int i = 0; i < m; ++i)
{
result_list.push_back(distribution(generator));
}
std::sort(std::begin(result_list),std::end(result_list));
for (int i = 0; i < m; ++i)
{
result_list[i] += (i * k);
}
return result_list;
}
http://ideone.com/KOeR4R
.
EDIT: I adapted the text for the requirement to create ordered sequences, each with the same probability.
Create random numbers a_i for i=0..M-1 without duplicates. Sort them. Then create numbers
b_i=a_i + i*(K-1)
Given the construction, those numbers b_i have the required gaps, because the a_i already have gaps of at least 1. In order to make sure those b values cover exactly the required range [1..N], you must ensure a_i are picked from a range [1..N-(M-1)*(K-1)]. This way you get truly independent numbers. Well, as independent as possible given the required gap. Because of the sorting you get O(M log M) performance again, but this shouldn't be too bad. Sorting is typically very fast. In Python it looks like this:
import random
def random_list( N, M, K ):
s = set()
while len(s) < M:
s.add( random.randint( 1, N-(M-1)*(K-1) ) )
res = sorted( s )
for i in range(M):
res[i] += i * (K-1)
return res
First off: this will be an attempt to show that there's a bijection between the (M+1)- compositions (with the slight modification that we will allow addends to be 0) of the value N - (M-1)*K and the valid solutions to your problem. After that, we only have to pick one of those compositions uniformly at random and apply the bijection.
Bijection:
Let
Then the xi form an M+1-composition (with 0 addends allowed) of the value on the left (notice that the xi do not have to be monotonically increasing!).
From this we get a valid solution
by setting the values mi as follows:
We see that the distance between mi and mi + 1 is at least K, and mM is at most N (compare the choice of the composition we started out with). This means that every (M+1)-composition that fulfills the conditions above defines exactly one valid solution to your problem. (You'll notice that we only use the xM as a way to make the sum turn out right, we don't use it for the construction of the mi.)
To see that this gives a bijection, we need to see that the construction can be reversed; for this purpose, let
be a given solution fulfilling your conditions. To get the composition this is constructed from, define the xi as follows:
Now first, all xi are at least 0, so that's alright. To see that they form a valid composition (again, every xi is allowed to be 0) of the value given above, consider:
The third equality follows since we have this telescoping sum that cancels out almost all mi.
So we've seen that the described construction gives a bijection between the described compositions of N - (M-1)*K and the valid solutions to your problem. All we have to do now is pick one of those compositions uniformly at random and apply the construction to get a solution.
Picking a composition uniformly at random
Each of the described compositions can be uniquely identified in the following way (compare this for illustration): reserve N - (M-1)*K spaces for the unary notation of that value, and another M spaces for M commas. We get an (M+1)- composition of N - (M-1)*K by choosing M of the N - (M-1)*K + M spaces, putting commas there, and filling the rest with |. Then let x0 be the number of | before the first comma, xM+1 the number of | after the last comma, and all other xi the number of | between commas i and i+1. So all we have to do is pick an M-element subset of the integer interval[1; N - (M-1)*K + M] uniformly at random, which we can do for example with the Fisher-Yates shuffle in O(N + M log M) (we need to sort the M delimiters to build the composition) since M*K needs to be in O(N) for any solutions to exist. So if N is bigger than M by at least a logarithmic factor, then this is linear in N.
Note: #DavidEisenstat suggested that there are more space efficient ways of picking the M-element subset of that interval; I'm not aware of any, I'm afraid.
You can get an error-proof algorithm out of this by doing the simple input validation we get from the construction above that N ≥ (M-1) * K and that all three values are at least 1 (or 0, if you define the empty set as a valid solution for that case).
Why not do this:
for (int i = 0; i < M; ++i) {
pick a random number between K and N/M
add this number to (N/M)* i;
Now you have M random numbers, distributed evenly along N, all of which have a difference of at least K. It's in O(n) time. As an added bonus, it's already sorted. :-)
EDIT:
Actually, the "pick a random number" part shouldn't be between K and N/M, but between min(K, [K - (N/M * i - previous value)]). That would ensure that the differences are still at least K, and not exclude values that should not be missed.
Second EDIT:
Well, the first case shouldn't be between K and N/M - it should be between 0 and N/M. Just like you need special casing for when you get close to the N/M*i border, we need special initial casing.
Aside from that, the issue you brought up in your comments was fair representation, and you're right. As my pseudocode is presented, it currently completely misses the excess between N/M*M and N. It's another edge case; simply change the random values of your last range.
Now, in this case, your distribution will be different for the last range. Since you have more numbers, you have slightly less chance for each number than you do for all the other ranges. My understanding is that because you're using ">>", this shouldn't really impact the distribution, i.e. the difference in size in the sample set should be nominal. But if you want to make it more fair, you divide the excess equally among each range. This makes your initial range calculation more complex - you'll have to augment each range based on how much remainder there is divided by M.
There are lots of special cases to look out for, but they're all able to be handled. I kept the pseudocode very basic just to make sure that the general concept came through clearly. If nothing else, it should be a good starting point.
Third and Final EDIT:
For those worried that the distribution has a forced evenness, I still claim that there's nothing saying it can't. The selection is uniformly distributed in each segment. There is a linear way to keep it uneven, but that also has a trade-off: if one value is selected extremely high (which should be unlikely given a very large N), then all the other values are constrained:
int prevValue = 0;
int maxRange;
for (int i = 0; i < M; ++i) {
maxRange = N - (((M - 1) - i) * K) - prevValue;
int nextValue = random(0, maxRange);
prevValue += nextValue;
store previous value;
prevValue += K;
}
This is still linear and random and allows unevenness, but the bigger prevValue gets, the more constrained the other numbers become. Personally, I prefer my second edit answer, but this is an available option that given a large enough N is likely to satisfy all the posted requirements.
Come to think of it, here's one other idea. It requires a lot more data maintenance, but is still O(M) and is probably the most fair distribution:
What you need to do is maintain a vector of your valid data ranges and a vector of probability scales. A valid data range is just the list of high-low values where K is still valid. The idea is you first use the scaled probability to pick a random data range, then you randomly pick a value within that range. You remove the old valid data range and replace it with 0, 1 or 2 new data ranges in the same position, depending on how many are still valid. All of these actions are constant time other than handling the weighted probability, which is O(M), done in a loop M times, so the total should be O(M^2), which should be much better than O(NlogN) because N >> M.
Rather than pseudocode, let me work an example using OP's original example:
0th iteration: valid data ranges are from [0...100Mill], and the weight for this range is 1.0.
1st iteration: Randomly pick one element in the one element vector, then randomly pick one element in that range.
If the element is, e.g. 12345678, then we remove the [0...100Mill] and replace it with [0...12344678] and [12346678...100Mill]
If the element is, e.g. 500, then we remove the [0...100Mill] and replace it with just [1500...100Mill], since [0...500] is no longer a valid range. The only time we will replace it with 0 ranges is in the unlikely event that you have a range with only one number in it and it gets picked. (In that case, you'll have 3 numbers in a row that are exactly K apart from each other.)
The weight for the ranges are their length over the total length, e.g. 12344678/(12344678 + (100Mill - 12346678)) and (100Mill - 12346678)/(12344678 + (100Mill - 12346678))
In the next iterations, you do the same thing: randomly pick a number between 0 and 1 and determine which of the ranges that scale falls into. Then randomly pick a number in that range, and replace your ranges and scales.
By the time it's done, we're no longer acting in O(M), but we're still only dependent on the time of M instead of N. And this actually is both uniform and fair distribution.
Hope one of these ideas works for you!

Find a prime number?

To find whether N is a prime number we only need to look for all numbers less or equal to sqrt(N). Why is that? I am writing a C code so trying to understand a reason behind it.
N is prime if it is a positive integer which is divisible by exactly two positive integers, 1 and N. Since a number's divisors cannot be larger than that number, this gives rise to a simple primality test:
If an integer N, greater than 1, is not divisible by any integer in the range [2, N-1], then N is prime. Otherwise, N is not prime.
However, it would be nice to modify this test to make it faster. So let us investigate.
Note that the divisors of N occur in pairs. If N is divisible by a number M, then it is also divisible by N/M. For instance, 12 is divisble by 6, and so also by 2. Furthermore, if M >= sqrt(N), then N/M <= sqrt(N).
This means that if no numbers less than or equal to sqrt(N) divide N, no numbers greater than sqrt(N) divide N either (excepting 1 and N themselves), otherwise a contradiction would arise.
So we have a better test:
If an integer N, greater than 1, is not divisible by any integer in the range [2, sqrt(N)], then N is prime. Otherwise, N is not prime.
if you consider the reasoning above, you should see that a number which passes this test also passes the first test, and a number which fails this test also fails the first test. The tests are therefore equivalent.
A composite number (one that is not prime, or 1) has at least 1 pair of factors, and it is guaranteed that one of the numbers from each pair is less than or equal to the square root of the number (which is what you are asking about).
If you square the square root of the number, you get the number itself (sqrt(n) * sqrt(n) = n), so if you made one of the numbers bigger (than sqrt(n)) you would have to make the other one smaller. If you then only check the numbers 2 through sqrt(n) you will have checked all of the possible factors, since each of those factors will be paired with a number that is greater than sqrt(n) (except of course if the number is in fact a square of some other number, like 4, 9, 16, etc...but that doesn't matter since you know they aren't prime; they are easily factored by sqrt(n) itself).
The reason is simple, any number bigger than the sqrt, will cause the other multiplier, to be smaller than the sqrt. In such case, you should have already check it.
Let n=a×b be composite.
Assume a>sqrt(n) and b>sqrt(n).
a×b > sqrt(n)×sqrt(n)
a×b > n
But we know a×b=n, therefore a<sqrt(n) or b<sqrt(n).
Since you only need to know a or b to show n is composite, you only need to check the numbers up to sqrt(n) to find such a number.
Because in the worst case, number n can be expresed as a2.
If the number can be expressed diferently, that men that one of divisors will be less than a = sqrt(n), but the other can be greater.