I saw a question on careercup, but I do not get the answer I want there. I wrote an answer myself and want your comment on my analysis of time complexity and comment on the algorithm and code. Or you could provide a better algorithm in terms of time. Thanks.
You are given d > 0 fair dice with n > 0 "sides", write an function that returns a histogram of the frequency of the result of dice rolls.
For example, for 2 dice, each with 3 sides, the results are:
(1, 1) -> 2
(1, 2) -> 3
(1, 3) -> 4
(2, 1) -> 3
(2, 2) -> 4
(2, 3) -> 5
(3, 1) -> 4
(3, 2) -> 5
(3, 3) -> 6
And the function should return:
2: 1
3: 2
4: 3
5: 2
6: 1
(my sol). The time complexity if you use a brute force depth first search is O(n^d). However, you can use the DP idea to solve this problem. For example, d=3 and n=3. You can use the result of d==1 when computing d==2:
d==1
num #
1 1
2 1
3 1
d==2
first roll second roll is 1
num # num #
1 1 2 1
2 1 -> 3 1
3 1 4 1
first roll second roll is 2
num # num #
1 1 3 1
2 1 -> 4 1
3 1 5 1
first roll second roll is 3
num # num #
1 1 4 1
2 1 -> 5 1
3 1 6 1
Therefore,
second roll
num #
2 1
3 2
4 3
5 2
6 1
The time complexity of this DP algorithm is
SUM_i(1:d) {n*[n(d-1)-(d-1)+1]} ~ O(n^2*d^2)
~~~~~~~~~~~~~~~ <--eg. d=2, n=3, range from 2~6
The code is written in C++ as follows
vector<pair<int,long long>> diceHisto(int numSide, int numDice) {
int n = numSide*numDice;
vector<long long> cur(n+1,0), nxt(n+1,0);
for(int i=1; i<=numSide; i++) cur[i]=1;
for(int i=2; i<=numDice; i++) {
int start = i-1, end = (i-1)*numSide; // range of previous sum of rolls
//cout<<"start="<<start<<" end="<<end<<endl;
for(int j=1; j<=numSide; j++) {
for(int k=start; k<=end; k++)
nxt[k+j] += cur[k];
}
swap(cur,nxt);
for(int j=start; j<=end; j++) nxt[j]=0;
}
vector<pair<int,long long>> result;
for(int i=numDice; i<=numSide*numDice; i++)
result.push_back({i,cur[i]});
return result;
}
You can do it in O(n*d^2). First, note that the generating function for an n-sided dice is p(n) = x+x^2+x^3+...+x^n, and that the distribution for d throws has generating function p(n)^d. Representing the polynomials as arrays, you need O(nd) coefficients, and multiplying by p(n) can be done in a single pass in O(nd) time by keeping a rolling sum.
Here's some python code that implements this. It has one non-obvious optimisation: it throws out a factor x from each p(n) (or equivalently, it treats the dice as having faces 0,1,2,...,n-1 rather than 1,2,3,...,n) which is why d is added back in when showing the distribution.
def dice(n, d):
r = [1] + [0] * (n-1) * d
nr = [0] * len(r)
for k in xrange(d):
t = 0
for i in xrange(len(r)):
t += r[i]
if i >= n:
t -= r[i-n]
nr[i] = t
r, nr = nr, r
return r
def show_dist(n, d):
for i, k in enumerate(dice(n, d)):
if k: print i + d, k
show_dist(6, 3)
The time and space complexity are easy to see: there's nested loops with d and (n-1)*d iterations so the time complexity is O(n.d^2), and there's two arrays of size O(nd) and no other allocation, so the space complexity is O(nd).
Just in case, here a simple example in Python using the OpenTurns platform.
import openturns as ot
d = 2 # number of dice
n = 6 # number of sides per die
# possible values
dice_distribution = ot.UserDefined([[i] for i in range(1, n + 1)])
# create the sum distribution d times the sum
sum_distribution = sum([dice_distribution] * d)
That's it!
print(sum_distribution)
will show you all the possible values and their corresponding probabilities:
>>> UserDefined(
{x = [2], p = 0.0277778},
{x = [3], p = 0.0555556},
{x = [4], p = 0.0833333},
{x = [5], p = 0.111111},
{x = [6], p = 0.138889},
{x = [7], p = 0.166667},
{x = [8], p = 0.138889},
{x = [9], p = 0.111111},
{x = [10], p = 0.0833333},
{x = [11], p = 0.0555556},
{x = [12], p = 0.0277778}
)
You can also draw the probability distribution function:
sum_distribution.drawPDF()
Related
let say I have a total number
tN = 12
and a set of elements
elem = [1,2,3,4]
and a prob for each element to be taken
prob = [0.0, 0.5, 0.75, 0.25]
i need to get a random multiset of these elements, such as
the taken elements reflects the prob
the sum of each elem is tN
with the example above, here's some possible outcome:
3 3 2 4
2 3 2 3 2
3 4 2 3
2 2 3 3 2
3 2 3 2 2
at the moment, maxtN will be 64, and elements the one above (1,2,3,4).
is this a Knapsack problem? how would you easily resolve it? both "on the fly" or "pre-calculate" approch will be allowed (or at least, depends by the computation time). I'm doing it for a c++ app.
Mission: don't need to have exactly the % in the final seq. Just to give more possibility to an elements to be in the final seq due to its higher prob. In few words: in the example, i prefer get seq with more 3-2 rather than 4, and no 1.
Here's an attempt to select elements with its prob, on 10 takes:
Randomizer randomizer;
int tN = 12;
std::vector<int> elem = {2, 3, 4};
std::vector<float> prob = {0.5f, 0.75f, 0.25f};
float probSum = std::accumulate(begin(prob), end(prob), 0.0f, std::plus<float>());
std::vector<float> probScaled;
for (size_t i = 0; i < prob.size(); i++)
{
probScaled.push_back((i == 0 ? 0.0f : probScaled[i - 1]) + (prob[i] / probSum));
}
for (size_t r = 0; r < 10; r++)
{
float rnd = randomizer.getRandomValue();
int index = 0;
for (size_t i = 0; i < probScaled.size(); i++)
{
if (rnd < probScaled[i])
{
index = i;
break;
}
}
std::cout << elem[index] << std::endl;
}
which gives, for example, this choice:
3
3
2
2
4
2
2
4
3
3
Now i just need to build a multiset which sum up to tN. Any tips?
I am trying to find the Time Complexity of this algorithm.
The iterative: algorithm produces all the bit-strings within a given Hamming distance, from the input bit-string. It generates all increasing sequences 0 <= a[0] < ... < a[dist-1] < strlen(num), and reverts bits at corresponding indices.
The vector a is supposed to keep indices for which bits have to be inverted. So if a contains the current index i, we print 1 instead of 0 and vice versa. Otherwise we print the bit as is (see else-part), as shown below:
// e.g. hamming("0000", 2);
void hamming(const char* num, size_t dist) {
assert(dist > 0);
vector<int> a(dist);
size_t k = 0, n = strlen(num);
a[k] = -1;
while (true)
if (++a[k] >= n)
if (k == 0)
return;
else {
--k;
continue;
}
else
if (k == dist - 1) {
// this is an O(n) operation and will be called
// (n choose dist) times, in total.
print(num, a);
}
else {
a[k+1] = a[k];
++k;
}
}
What is the Time Complexity of this algorithm?
My attempt says:
dist * n + (n choose t) * n + 2
but this seems not to be true, consider the following examples, all with dist = 2:
len = 3, (3 choose 2) = 3 * O(n), 10 while iterations
len = 4, (4 choose 2) = 6 * O(n), 15 while iterations
len = 5, (5 choose 2) = 9 * O(n), 21 while iterations
len = 6, (6 choose 2) = 15 * O(n), 28 while iterations
Here are two representative runs (with the print to be happening at the start of the loop):
000, len = 3
k = 0, total_iter = 1
vector a = -1 0
k = 1, total_iter = 2
vector a = 0 0
Paid O(n)
k = 1, total_iter = 3
vector a = 0 1
Paid O(n)
k = 1, total_iter = 4
vector a = 0 2
k = 0, total_iter = 5
vector a = 0 3
k = 1, total_iter = 6
vector a = 1 1
Paid O(n)
k = 1, total_iter = 7
vector a = 1 2
k = 0, total_iter = 8
vector a = 1 3
k = 1, total_iter = 9
vector a = 2 2
k = 0, total_iter = 10
vector a = 2 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gsamaras#pythagoras:~/Desktop/generate_bitStrings_HammDistanceT$ ./iter
0000, len = 4
k = 0, total_iter = 1
vector a = -1 0
k = 1, total_iter = 2
vector a = 0 0
Paid O(n)
k = 1, total_iter = 3
vector a = 0 1
Paid O(n)
k = 1, total_iter = 4
vector a = 0 2
Paid O(n)
k = 1, total_iter = 5
vector a = 0 3
k = 0, total_iter = 6
vector a = 0 4
k = 1, total_iter = 7
vector a = 1 1
Paid O(n)
k = 1, total_iter = 8
vector a = 1 2
Paid O(n)
k = 1, total_iter = 9
vector a = 1 3
k = 0, total_iter = 10
vector a = 1 4
k = 1, total_iter = 11
vector a = 2 2
Paid O(n)
k = 1, total_iter = 12
vector a = 2 3
k = 0, total_iter = 13
vector a = 2 4
k = 1, total_iter = 14
vector a = 3 3
k = 0, total_iter = 15
vector a = 3 4
The while loop is somewhat clever and subtle, and it's arguable that it's doing two different things (or even three if you count the initialisation of a). That's what's making your complexity calculations challenging, and it's also less efficient than it could be.
In the abstract, to incrementally compute the next set of indices from the current one, the idea is to find the last index, i, that's less than n-dist+i, increment it, and set the following indexes to a[i]+1, a[i]+2, and so on.
For example, if dist=5, n=11 and your indexes are:
0, 3, 5, 9, 10
Then 5 is the last value less than n-dist+i (because n-dist is 6, and 10=6+4, 9=6+3, but 5<6+2).
So we increment 5, and set the subsequent integers to get the set of indexes:
0, 3, 6, 7, 8
Now consider how your code runs, assuming k=4
0, 3, 5, 9, 10
a[k] + 1 is 11, so k becomes 3.
++a[k] is 10, so a[k+1] becomes 10, and k becomes 4.
++a[k] is 11, so k becomes 3.
++a[k] is 11, so k becomes 2.
++a[k] is 6, so a[k+1] becomes 6, and k becomes 3.
++a[k] is 7, so a[k+1] becomes 7, and k becomes 4.
++a[k] is 8, and we continue to call the print function.
This code is correct, but it's not efficient because k scuttles backwards and forwards as it's searching for the highest index that can be incremented without causing an overflow in the higher indices. In fact, if the highest index is j from the end, the code uses a non-linear number iterations of the while loop. You can easily demonstrate this yourself if you trace how many iterations of the while loop occur when n==dist for different values of n. There is exactly one line of output, but you'll see an O(2^n) growth in the number of iterations (in fact, you'll see 2^(n+1)-2 iterations).
This scuttling makes your code needlessly inefficient, and also hard to analyse.
Instead, you can write the code in a more direct way:
void hamming2(const char* num, size_t dist) {
int a[dist];
for (int i = 0; i < dist; i++) {
a[i] = i;
}
size_t n = strlen(num);
while (true) {
print(num, a);
int i;
for (i = dist - 1; i >= 0; i--) {
if (a[i] < n - dist + i) break;
}
if (i < 0) return;
a[i]++;
for (int j = i+1; j<dist; j++) a[j] = a[i] + j - i;
}
}
Now, each time through the while loop produces a new set of indexes. The exact cost per iteration is not straightforward, but since print is O(n), and the remaining code in the while loop is at worst O(dist), the overall cost is O(N_INCR_SEQ(n, dist) * n), where N_INCR_SEQ(n, dist) is the number of increasing sequences of natural numbers < n of length dist. Someone in the comments provides a link that gives a formula for this.
Notice, that given n which represents the length, and t which represents the distance required, the number of increasing, non-negative series of t integers between 1 and n (or in indices form, between 0 and n-1) is indeed n choose t, since we pick t distinct indices.
The problem occurs with your generation of those series:
-First, notice that for example in the case of length 4, you actually go over 5 different indices, 0 to 4.
-Secondly, notice that you are taking in account series with identical indices (in the case of t=2, its 0 0, 1 1, 2 2 and so on), and generally, you would go through every non-decreasing series, instead of through every increasing series.
So for calculating the TC of your program, make sure you take that into account.
Hint: try to make one-to-one correspondence from the universe of those series, to the universe of integer solutions to some equation.
If you need the direct solution, take a look here :
https://math.stackexchange.com/questions/432496/number-of-non-decreasing-sequences-of-length-m
The final solution is (n+t-1) choose (t), but noticing the first bullet, in your program, its actually ((n+1)+t-1) choose (t), since you loop with one extra index.
Denote
((n+1)+t-1) choose (t) =: A , n choose t =: B
overall we get O(1) + B*O(n) + (A-B)*O(1)
int sum_down(int x)
{
if (x >= 0)
{
x = x - 1;
int y = x + sum_down(x);
return y + sum_down(x);
}
else
{
return 1;
}
}
What is this smallest integer value of the parameter x, so that the returned value is greater than 1.000.000 ?
Right now I am just doing it by trial and error and since this question is asked via a paper format. I don't think I will have enough time to do trial and error. Question is, how do you guys visualise this quickly such that it can be solved easily. Thanks guys and I am new to programming so thanks in advance!
The recursion logic:
x = x - 1;
int y = x + sum_down(x);
return y + sum_down(x);
can be simplified to:
x = x - 1;
int y = x + sum_down(x) + sum_down(x);
return y;
which can be simplified to:
int y = (x-1) + sum_down(x-1) + sum_down(x-1);
return y;
which can be simplified to:
return (x-1) + 2*sum_down(x-1);
Put in mathematical form,
f(N) = (N-1) + 2*f(N-1)
with the recursion terminating when N is -1. f(-1) = 1.
Hence,
f(0) = -1 + 2*1 = 1
f(1) = 0 + 2*1 = 2
f(2) = 1 + 2*2 = 5
...
f(18) = 17 + 2*f(17) = 524269
f(19) = 18 + 2*524269 = 1048556
Your program can be written this way (sorry about c#):
public static void Main()
{
int i = 0;
int j = 0;
do
{
i++;
j = sum_down(i);
Console.Out.WriteLine("j:" + j);
} while (j < 1000000);
Console.Out.WriteLine("i:" + i);
}
static int sum_down(int x)
{
if (x >= 0)
{
return x - 1 + 2 * sum_down(x - 1);
}
else
{
return 1;
}
}
So at first iteration you'll get 2, then 5, then 12... So you can neglect the x-1 part since it'll stay little compared to the multiplication.
So we have:
i = 1 => sum_down ~= 4 (real is 2)
i = 2 => sum_down ~= 8 (real is 5)
i = 3 => sum_down ~= 16 (real is 12)
i = 4 => sum_down ~= 32 (real is 27)
i = 5 => sum_down ~= 64 (real is 58)
So we can say that sum_down(x) ~= 2^x+1. Then it's just basic math with 2^x+1 < 1 000 000 which is 19.
A bit late, but it's not that hard to get an exact non-recursive formula.
Write it up mathematically, as explained in other answers already:
f(-1) = 1
f(x) = 2*f(x-1) + x-1
This is the same as
f(-1) = 1
f(x+1) = 2*f(x) + x
(just switched from x and x-1 to x+1 and x, difference 1 in both cases)
The first few x and f(x) are:
x: -1 0 1 2 3 4
f(x): 1 1 2 5 12 27
And while there are many arbitrary complicated ways to transform this into a non-recursive formula, with easy ones it often helps to write up what the difference is between each two elements:
x: -1 0 1 2 3 4
f(x): 1 1 2 5 12 27
0 1 3 7 15
So, for some x
f(x+1) - f(x) = 2^(x+1) - 1
f(x+2) - f(x) = (f(x+2) - f(x+1)) + (f(x+1) - f(x)) = 2^(x+2) + 2^(x+1) - 2
f(x+n) - f(x) = sum[0<=i<n](2^(x+1+i)) - n
With eg. a x=0 inserted, to make f(x+n) to f(n):
f(x+n) - f(x) = sum[0<=i<n](2^(x+1+i)) - n
f(0+n) - f(0) = sum[0<=i<n](2^(0+1+i)) - n
f(n) - 1 = sum[0<=i<n](2^(i+1)) - n
f(n) = sum[0<=i<n](2^(i+1)) - n + 1
f(n) = sum[0<i<=n](2^i) - n + 1
f(n) = (2^(n+1) - 2) - n + 1
f(n) = 2^(n+1) - n - 1
No recursion anymore.
How about this :
int x = 0;
while (sum_down(x) <= 1000000)
{
x++;
}
The loop increments x until the result of sum_down(x) is superior to 1.000.000.
Edit : The result would be 19.
While trying to understand and simplify the recursion logic behind the sum_down() function is enlightening and informative, this snippet tend to be logical and pragmatic in that it does not try and solve the problem in terms of context, but in terms of results.
Two lines of Python code to answer your question:
>>> from itertools import * # no code but needed for dropwhile() and count()
Define the recursive function (See R Sahu's answer)
>>> f = lambda x: 1 if x<0 else (x-1) + 2*f(x-1)
Then use the dropwhile() function to remove elements from the list [0, 1, 2, 3, ....] for which f(x)<=1000000, resulting in a list of integers for which f(x) > 1000000. Note: count() returns an infinite "list" of [0, 1, 2, ....]
The dropwhile() function returns a Python generator so we use next() to get the first value of the list:
>>> next(dropwhile(lambda x: f(x)<=1000000, count()))
19
consider that
0 -- is the first
1 -- is the second
2 -- is the third
.....
9 -- is the 10th
11 -- is the 11th
what is an efficient algorithm to find the nth palindromic number?
I'm assuming that 0110 is not a palindrome, as it is 110.
I could spend a lot of words on describing, but this table should be enough:
#Digits #Pal. Notes
0 1 "0" only
1 9 x with x = 1..9
2 9 xx with x = 1..9
3 90 xyx with xy = 10..99 (in other words: x = 1..9, y = 0..9)
4 90 xyyx with xy = 10..99
5 900 xyzyx with xyz = 100..999
6 900 and so on...
The (nonzero) palindromes with even number of digits start at p(11) = 11, p(110) = 1001, p(1100) = 100'001,.... They are constructed by taking the index n - 10^L, where L=floor(log10(n)), and append the reversal of this number: p(1101) = 101|101, p(1102) = 102|201, ..., p(1999) = 999|999, etc. This case must be considered for indices n >= 1.1*10^L but n < 2*10^L.
When n >= 2*10^L, we get the palindromes with odd number of digits, which start with p(2) = 1, p(20) = 101, p(200) = 10001 etc., and can be constructed the same way, using again n - 10^L with L=floor(log10(n)), and appending the reversal of that number, now without its last digit: p(21) = 11|1, p(22) = 12|1, ..., p(99) = 89|8, ....
When n < 1.1*10^L, subtract 1 from L to be in the correct setting with n >= 2*10^L for the case of an odd number of digits.
This yields the simple algorithm:
p(n) = { L = logint(n,10);
P = 10^(L - [1 < n < 1.1*10^L]); /* avoid exponent -1 for n=1 */
n -= P;
RETURN( n * 10^L + reverse( n \ 10^[n >= P] ))
}
where [...] is 1 if ... is true, 0 else, and \ is integer division.
(The expression n \ 10^[...] is equivalent to: if ... then n\10 else n.)
(I added the condition n > 1 in the exponent to avoid P = 10^(-1) for n=0. If you use integer types, you don't need this. Another choice it to put max(...,0) as exponent in P, or use if n=1 then return(0) right at the start. Also notice that you don't need L after assigning P, so you could use the same variable for both.)
I have a range of numbers from 100 to 999. I need to get every number separately of it and check whether it can be divided by 2. For example:
232
2 divided by 2 = 1 = true
3 divided by 2 = 1.5 = false
2 divided by 2 = 1 = true
and so on.
To get the first number all I have to do is to divide the entire number by 100.
int x = 256;
int k = x/100;
so x would hold a value of 2.
Now, is there a way to check those other ones? Because k = x/10; would already be 25.
Try this:
int x = 256;
int i = x / 100; // i is 2
int j = (x % 100) / 10; // j is 5
int k = (x % 10); // k is 6
maybe look into integer division and the modulo.
int k1 = (x / 10) % 10 // "10s"
int k2 = ( x / 100 ) % 10 // "100s"
//etc etc
Use modulo to get the last digit of the number, then divide by ten to discard the last digit.
Repeat while the number is non-zero.
What you need is the modulus operator %. It does a division and returns the reminder.
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
4 % 2 = 0
...
eg. take 232:
int num = 232;
int at_ones_place = num % 10;
int at_tens_place = ( num /10 ) % 10 ;
int at_hundreds_place = (num /100);