Related
I have a word corpus of say 3000 words such as [hello, who, this ..].
I want to find the nth 3 word combination from this corpus.I am fine with any order as long as the algorithm gives consistent output.
What would be the time complexity of the algorithm.
I have seen this answer but was looking for something simple.
(Note that I will be using 1-based indexes and ranks throughout this answer.)
To generate all combinations of 3 elements from a list of n elements, we'd take all elements from 1 to n-2 as the first element, then for each of these we'd take all elements after the first element up to n-1 as the second element, then for each of these we'd take all elements after the second element up to n as the third element. This gives us a fixed order, and a direct relation between the rank and a specific combination.
If we take element i as the first element, there are (n-i choose 2) possibilities for the second and third element, and thus (n-i choose 2) combinations with i as the first element. If we then take element j as the second element, there are (n-j choose 1) = n-j possibilities for the third element, and thus n-j combinations with i and j as the first two elements.
Linear search in tables of binomial coefficients
With tables of these binomial coefficients, we can quickly find a specific combination, given its rank. Let's look at a simplified example with a list of 10 elements; these are the number of combinations with element i as the first element:
i
1 C(9,2) = 36
2 C(8,2) = 28
3 C(7,2) = 21
4 C(6,2) = 15
5 C(5,2) = 10
6 C(4,2) = 6
7 C(3,2) = 3
8 C(2,2) = 1
---
120 = C(10,3)
And these are the number of combinations with element j as the second element:
j
2 C(8,1) = 8
3 C(7,1) = 7
4 C(6,1) = 6
5 C(5,1) = 5
6 C(4,1) = 4
7 C(3,1) = 3
8 C(2,1) = 2
9 C(1,1) = 1
So if we're looking for the combination with e.g. rank 96, we look at the number of combinations for each choice of first element i, until we find which group of combinations the combination ranked 96 is in:
i
1 36 96 > 36 96 - 36 = 60
2 28 60 > 28 60 - 28 = 32
3 21 32 > 21 32 - 21 = 11
4 15 11 <= 15
So we know that the first element i is 4, and that within the 15 combinations with i=4, we're looking for the eleventh combination. Now we look at the number of combinations for each choice of second element j, starting after 4:
j
5 5 11 > 5 11 - 5 = 6
6 4 6 > 4 6 - 4 = 2
7 3 2 <= 3
So we know that the second element j is 7, and that the third element is the second combination with j=7, which is k=9. So the combination with rank 96 contains the elements 4, 7 and 9.
Binary search in tables of running total of binomial coefficients
Instead of creating a table of the binomial coefficients and then performing a linear search, it is of course more efficient to create a table of the running total of the binomial coefficient, and then perform a binary search on it. This will improve the time complexity from O(N) to O(logN); in the case of N=3000, the two look-ups can be done in log2(3000) = 12 steps.
So we'd store:
i
1 36
2 64
3 85
4 100
5 110
6 116
7 119
8 120
and:
j
2 8
3 15
4 21
5 26
6 30
7 33
8 35
9 36
Note that when finding j in the second table, you have to subtract the sum corresponding with i from the sums. Let's walk through the example of rank 96 and combination [4,7,9] again; we find the first value that is greater than or equal to the rank:
3 85 96 > 85
4 100 96 <= 100
So we know that i=4; we then subtract the previous sum next to i-1, to get:
96 - 85 = 11
Now we look at the table for j, but we start after j=4, and subtract the sum corresponding to 4, which is 21, from the sums. then again, we find the first value that is greater than or equal to the rank we're looking for (which is now 11):
6 30 - 21 = 9 11 > 9
7 33 - 21 = 12 11 <= 12
So we know that j=7; we subtract the previous sum corresponding to j-1, to get:
11 - 9 = 2
So we know that the second element j is 7, and that the third element is the second combination with j=7, which is k=9. So the combination with rank 96 contains the elements 4, 7 and 9.
Hard-coding the look-up tables
It is of course unnecessary to generate these look-up tables again every time we want to perform a look-up. We only need to generate them once, and then hard-code them into the rank-to-combination algorithm; this should take only 2998 * 64-bit + 2998 * 32-bit = 35kB of space, and make the algorithm incredibly fast.
Inverse algorithm
The inverse algorithm, to find the rank given a combination of elements [i,j,k] then means:
Finding the index of the elements in the list; if the list is sorted (e.g. words sorted alphabetically) this can be done with a binary search in O(logN).
Find the sum in the table for i that corresponds with i-1.
Add to that the sum in the table for j that corresponds with j-1, minus the sum that corresponds with i.
Add to that k-j.
Let's look again at the same example with the combination of elements [4,7,9]:
i=4 -> table_i[3] = 85
j=7 -> table_j[6] - table_j[4] = 30 - 21 = 9
k=9 -> k-j = 2
rank = 85 + 9 + 2 = 96
Look-up tables for N=3000
This snippet generates the look-up table with the running total of the binomial coefficients for i = 1 to 2998:
function C(n, k) { // binomial coefficient (Pascal's triangle)
if (k < 0 || k > n) return 0;
if (k > n - k) k = n - k;
if (! C.t) C.t = [[1]];
while (C.t.length <= n) {
C.t.push([1]);
var l = C.t.length - 1;
for (var i = 1; i < l / 2; i++)
C.t[l].push(C.t[l - 1][i - 1] + C.t[l - 1][i]);
if (l % 2 == 0)
C.t[l].push(2 * C.t[l - 1][(l - 2) / 2]);
}
return C.t[n][k];
}
for (var total = 0, x = 2999; x > 1; x--) {
total += C(x, 2);
document.write(total + ", ");
}
This snippet generates the look-up table with the running total of the binomial coefficients for j = 2 to 2999:
for (var total = 0, x = 2998; x > 0; x--) {
total += x;
document.write(total + ", ");
}
Code example
Here's a quick code example, unfortunately without the full hardcoded look-up tables, because of the size restriction on answers on SO. Run the snippets above and paste the results into the arrays iTable and jTable (after the leading zeros) to get the faster version with hard-coded look-up tables.
function combinationToRank(i, j, k) {
return iTable[i - 1] + jTable[j - 1] - jTable[i] + k - j;
}
function rankToCombination(rank) {
var i = binarySearch(iTable, rank, 1);
rank -= iTable[i - 1];
rank += jTable[i];
var j = binarySearch(jTable, rank, i + 1);
rank -= jTable[j - 1];
var k = j + rank;
return [i, j, k];
function binarySearch(array, value, first) {
var last = array.length - 1;
while (first < last - 1) {
var middle = Math.floor((last + first) / 2);
if (value > array[middle]) first = middle;
else last = middle;
}
return (value <= array[first]) ? first : last;
}
}
var iTable = [0]; // append look-up table values here
var jTable = [0, 0]; // and here
// remove this part when using hard-coded look-up tables
function C(n,k){if(k<0||k>n)return 0;if(k>n-k)k=n-k;if(!C.t)C.t=[[1]];while(C.t.length<=n){C.t.push([1]);var l=C.t.length-1;for(var i=1;i<l/2;i++)C.t[l].push(C.t[l-1][i-1]+C.t[l-1][i]);if(l%2==0)C.t[l].push(2*C.t[l-1][(l-2)/2])}return C.t[n][k]}
for (var iTotal = 0, jTotal = 0, x = 2999; x > 1; x--) {
iTable.push(iTotal += C(x, 2));
jTable.push(jTotal += x - 1);
}
document.write(combinationToRank(500, 1500, 2500) + "<br>");
document.write(rankToCombination(1893333750) + "<br>");
Is there efficient way to downscale number of elements in array by decimal factor?
I want to downsize elements from one array by certain factor.
Example:
If I have 10 elements and need to scale down by factor 2.
1 2 3 4 5 6 7 8 9 10
scaled to
1.5 3.5 5.5 7.5 9.5
Grouping 2 by 2 and use arithmetic mean.
My problem is what if I need to downsize array with 10 elements to 6 elements? In theory I should group 1.6 elements and find their arithmetic mean, but how to do that?
Before suggesting a solution, let's define "downsize" in a more formal way. I would suggest this definition:
Downsizing starts with an array a[N] and produces an array b[M] such that the following is true:
M <= N - otherwise it would be upsizing, not downsizing
SUM(b) = (M/N) * SUM(a) - The sum is reduced proportionally to the number of elements
Elements of a participate in computation of b in the order of their occurrence in a
Let's consider your example of downsizing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to six elements. The total for your array is 55, so the total for the new array would be (6/10)*55 = 33. We can achieve this total in two steps:
Walk the array a totaling its elements until we've reached the integer part of N/M fraction (it must be an improper fraction by rule 1 above)
Let's say that a[i] was the last element of a that we could take as a whole in the current iteration. Take the fraction of a[i+1] equal to the fractional part of N/M
Continue to the next number starting with the remaining fraction of a[i+1]
Once you are done, your array b would contain M numbers totaling to SUM(a). Walk the array once more, and scale the result by N/M.
Here is how it works with your example:
b[0] = a[0] + (2/3)*a[1] = 2.33333
b[1] = (1/3)*a[1] + a[2] + (1/3)*a[3] = 5
b[2] = (2/3)*a[3] + a[4] = 7.66666
b[3] = a[5] + (2/3)*a[6] = 10.6666
b[4] = (1/3)*a[6] + a[7] + (1/3)*a[8] = 13.3333
b[5] = (2/3)*a[8] + a[9] = 16
--------
Total = 55
Scaling down by 6/10 produces the final result:
1.4 3 4.6 6.4 8 9.6 (Total = 33)
Here is a simple implementation in C++:
double need = ((double)a.size()) / b.size();
double have = 0;
size_t pos = 0;
for (size_t i = 0 ; i != a.size() ; i++) {
if (need >= have+1) {
b[pos] += a[i];
have++;
} else {
double frac = (need-have); // frac is less than 1 because of the "if" condition
b[pos++] += frac * a[i]; // frac of a[i] goes to current element of b
have = 1 - frac;
b[pos] += have * a[i]; // (1-frac) of a[i] goes to the next position of b
}
}
for (size_t i = 0 ; i != b.size() ; i++) {
b[i] /= need;
}
Demo.
You will need to resort to some form of interpolation, as the number of elements to average isn't integer.
You can consider computing the prefix sum of the array, i.e.
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
yields by summation
0 1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36 45 55
Then perform linear interpolation to get the intermediate values that you are lacking, like at 0*, 10/6, 20/6, 30/5*, 40/6, 50/6, 60/6*. (Those with an asterisk are readily available).
0 1 10/6 2 3 20/6 4 5 6 40/6 7 8 50/6 9
1 3 15/3 6 10 35/3 15 21 28 100/3 36 45 145/3 55
Now you get fractional sums by subtracting values in pairs. The first average is
(15/3-1)/(10/6) = 12/5
I can't think of anything in the C++ library that will crank out something like this, all fully cooked and ready to go.
So you'll have to, pretty much, roll up your sleeves and go to work. At this point, the question of what's the "efficient" way of doing it boils down to its very basics. Which means:
1) Calculate how big the output array should be. Based on the description of the issue, you should be able to make that calculation even before looking at the values in the input array. You know the input array's size(), you can calculate the size() of the destination array.
2) So, you resize() the destination array up front. Now, you no longer need to worry about the time wasted in growing the size of the dynamic output array, incrementally, as you go through the input array, making your calculations.
3) So what's left is the actual work: iterating over the input array, and calculating the downsized values.
auto b=input_array.begin();
auto e=input_array.end();
auto p=output_array.begin();
Don't see many other options here, besides brute force iteration and calculations. Iterate from b to e, getting your samples, calculating each downsized value, and saving the resulting value into *p++.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a problem:
I have a N (N <= 40). N is a length of sequence of zeroz and ones. How to find the number of sequences of zeros and ones in which there are no three "1" together?
Example:
N = 3, answer = 7
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
Here's a solution using a recursive function :
(PHP code here, but it's really simple)
$seq = '';
function tree ($node, $flag, $seq)
{
if ($flag == 3) { return 0; }
if ($node == 0) { echo $seq, ' '; return 0;}
$seq1 = $seq.'1';
$seq2 = $seq.'0';
tree($node-1, $flag+1, $seq1);
tree($node-1, 0, $seq2);
}
tree(8, 0, $seq);
I use a tree to go through all the possible sequences, and a flag to check how many 1 in a row.
If there is two 1 in a row, then the flag reaches 3, and the function is stopped for this branch.
If we reach a leaf of the tree (ie. $node = 0), then the sequence is displayed, and the function ends.
Else, the function explores the two sub-trees starting from the current node.
void tree ( int node, int flag, std::string seq)
{
std::string seq1 = seq;
std::string seq2 = seq;
if(flag ==3) { return; }
if(node ==0) { printf("%s\n",seq.c_str()); return;}
seq1 += '1';
seq2 += '0';
tree(node-1, flag+1, seq1);
tree(node-1, 0, seq2);
}
You can write a grammar for the (non-empty) strings of this language. It's designed so that each string appears exactly once.
S := 0 | 1 | 11 | 10 | 110 | 0S | 10S | 110S
Let a_i be the total number of strings of length i in S.
First, look at the number of strings of length 1 on both sides of the grammar rule. There's a_1 in S by definition which deals with the left-hand-side.
a_1 = 2
For a_2, on the right-hand-side we immediately get two strings of length 2 (11 and 10), plus another two from the 0S rule (00 and 01). This gives us:
a_2 = 2 + a_1 = 4
Similarly, for a_3, we get:
a_3 = 1 + a_2 + a_1 = 7
(So far so good, we've got the right solution 7 for the case where the strings are length three).
For i > 3, consider the number of strings of length i on both sides.
a_i = a_{i-1} + a_{i-2} + a_{i-3}
Now we've got a recurrence we can use. A quick check for a_4...
a_4 = a_1 + a_2 + a_3 = 2 + 4 + 7 = 13.
There's 16 strings of length 4 and three containing 111: 1110, 0111, 1111. So 13 looks right!
Here's some code in Python for the general case, using this recurrence.
def strings_without_111(n):
if n == 0: return 1
a = [2, 4, 7]
for _ in xrange(n - 1):
a = [a[1], a[2], a[0] + a[1] + a[2]]
return a[0]
This is a dp problem. I will explain the solution in a way so that it is easy to modify it to count the number of sequences having no sequence a0a1a2 in them(where ai is arbitrary binary value).
I will use 4 helper variables each counting the sequence up to a given length that are valid and end with 00, 01, 10, and 11 respectively. Name those c00, c01, c10, c11. It is pretty obvious that for length N = 2, those numbers are all 1:
int c00 = 1;
int c01 = 1;
int c10 = 1;
int c11 = 1;
Now assuming we have counted the sequences up to a given length k we count the sequences in the four groups for length k + 1 in the following manner:
int new_c00 = c10 + c00;
int new_c01 = c10 + c00;
int new_c10 = c01 + c11;
int new_c11 = c01;
The logic above is pretty simple - if we append a 0 to either a sequence of length k ending at 0 0 or ending at 1 0 we end up with a new sequence of length k + 1 and ending with 0 0 and so on for the other equations above.
Note that c11 is not added to the number of sequences ending with 1 1 and with length k + 1. That is because if we append 1 to a sequence ending with 1 1 we will end up with an invalid sequence( ending at 1 1 1).
Here is a complete solution for your case:
int c00 = 1;
int c01 = 1;
int c10 = 1;
int c11 = 1;
for (int i = 0; i < n - 2; ++i) {
int new_c00 = c10 + c00;
int new_c01 = c10 + c00;
int new_c10 = c01 + c11;
int new_c11 = c01;
c00 = new_c00;
c01 = new_c01;
c10 = new_c10;
c11 = new_c11;
}
// total valid sequences of length n
int result = c00 + c01 + c10 + c11;
cout << result << endl;
Also you will have to take special care for the case when N < 2, because the above solution does not handle that correctly.
To find a number of all possible sequences for N bits are easy. It is 2^N.
To find all sequences contains 111 a bit harder.
Assume N=3 then Count = 1
111
Assume N=4 then Count = 3
0111
1110
1111
Assume N=5 then Count = 8
11100
11101
11110
11111
01110
01111
00111
10111
If you write simple simulation program it yields 1 3 8 20 47 107 ...
Subtract 2^n - count(n) = 2 4 7 13 24 44 81 149...
Google it and it gives OEIS sequence, known as tribonacci numbers. Solved by simple recurrent equation:
a(n) = a(n - 1) + a(n - 2) + a(n - 3)
I have three integer variables, that can take only the values 0, 1 and 2. I want to distinguish what combination of all three numbers I have, ordering doesn't count. Let's say the variables are called x, y and z. Then x=1, y=0, z=0 and x=0, y=1, z=0 and x=0, y=0, z=1 are all the same number in this case, I will refer to this combination as 001.
Now there are a hundred ways how to do this, but I am asking for an elegant solution, be it only for educational purposes.
I thought about bitwise shifting 001 by the amount of the value:
001 << 0 = 1
001 << 1 = 2
001 << 2 = 4
But then the numbers 002 and 111 would both give 6.
The shift idea is good, but you need 2 bits to count to 3. So try shifting by twice the number of bits:
1 << (2*0) = 1
1 << (2*1) = 4
1 << (2*2) = 16
Add these for all 3 numbers, and the first 2 bits will count how many 0 you have, the second 2 bits will count how many 1 and the third 2 bits will count how many 2.
Edit although the result is 6 bit long (2 bits per number option 0,1,2), you only need the lowest 4 bits for a unique identifier - as if you know how many 0 and 1 you have, then the number of 2 is determined also.
So instead of doing
res = 1<<(2*x);
res+= 1<<(2*y);
res+= 1<<(2*z);
you can do
res = x*x;
res+= y*y;
res+= z*z;
because then
0*0 = 0 // doesn't change result. We don't count 0
1*1 = 1 // we count the number of 1 in the 2 lower bits
2*2 = 4 // we count the number of 2 in the 2 higher bits
hence using only 4 bits instead of 6.
When the number of distinct possibilities is small, using a lookup table could be used.
First, number all possible combinations of three digits, like this:
Combinations N Indexes
------------- - ------
000 0 0
001, 010, 100 1 1, 3, 9
002, 020, 200 2 2, 6, 18
011, 101, 110 3 4, 10, 12
012, 021, 102, 120, 201, 210 4 5, 7, 11, 15, 19, 21
022, 202, 220 5 8, 20, 24
111 6 13
112, 121, 211 7 14, 16, 22
122, 212, 221 8 17, 23, 25
222 9 26
The first column shows identical combinations; the second column shows the number of the combination (I assigned them arbitrarily); the third column shows the indexes of each combination, computed as 9*<first digit> + 3*<second digit> + <third digit>.
Next, build a look-up table for each of these ten combinations, using this expression as an index:
9*a + 3*b + c
where a, b, and c are the three numbers that you have. The table would look like this:
int lookup[] = {
0, 1, 2, 1, 3, 4, 2, 4, 5, 1
, 3, 4, 3, 6, 7, 4, 7, 8, 2, 4
, 5, 4, 7, 8, 5, 8, 9
};
This is a rewrite of the first table, with values at the indexes corresponding to the value in the column N. For example, combination number 1 is founds at indexes 1, 3, and 9; combination 2 is at indexes 2, 6, and 18, and so on.
To obtain the number of the combination, simply check
int combNumber = lookup[9*a + 3*b + c];
For such small numbers, it would be easiest to just check them individually, instead of trying to be fancy, eg:
bool hasZero = false;
bool hasOne = false;
bool hasTwo = false;
// given: char* number or char[] number...
for(int i = 0; i < 3; ++i)
{
switch (number[i])
{
case '0': hasZero = true; break;
case '1': hasOne = true; break;
case '2': hasTwo = true; break;
default: /* error! */ break;
}
}
If I understand you correctly, you have some sequence of numbers that can either be 1, 2, or 3, where the permutation of them doesn't matter (just the different combinations).
That being the case:
std::vector<int> v{1, 2, 3};
std::sort(v.begin(), v.end());
That will keep all of the different combinations properly aligned, and you could easily write a loop to test for equality.
Alternatively, you could use a std::array<int, N> (where N is the number of possible values - in this case 3).
std::array<int, 3> a;
Where you would set a[0] equal to the number of 1s you have, a[1] equal to the number of '2's, etc.
// if your string is 111
a[0] = 3;
// if your string is 110 or 011
a[0] = 2;
// if your string is 100 or 010 or 001
a[0] = 1;
// if your string is 120
a[0] = 1;
a[1] = 1;
// if your string is 123
a[0] = 1;
a[1] = 1;
a[2] = 1;
If you are looking to store it in a single 32-bit integer:
unsigned long x = 1; // number of 1's in your string
unsigned long y = 1; // number of 2's in your string
unsigned long z = 1; // number of 3's in your string
unsigned long result = x | y << 8 | z << 16;
To retrieve the number of each, you would do
unsigned long x = result & 0x000000FF;
unsigned long y = (result >> 8) & 0x000000FF;
unsigned long z = (result >> 16) & 0x000000FF;
This is very similar to what happens in the RBG macros.
int n[3]={0,0,0};
++n[x];
++n[y];
++n[z];
Now, in the n array, you have a unique ordered combination of values for each unique unordered combination of x,y,z.
For example, both x=1,y=0,z=0 and x=0,y=0,z=1 will give you n={2,1,0}
How does this code concatenate the data from the string buffer? What is the * 10 doing? I know that by subtracting '0' you are subtracting the ASCII so turn into an integer.
char *buf; // assume that buf is assigned a value such as 01234567891234567
long key_num = 0;
someFunction(&key_num);
...
void someFunction(long *key_num) {
for (int i = 0; i < 18; i++)
*key_num = *key_num * 10 + (buf[i] - '0')
}
(Copied from my memory of code that I am working on recently)
It's basically an atoi-type (or atol-type) function for creating an integral value from a string. Consider the string "123".
Before starting, key_num is set to zero.
On the first iteration, that's multiplied by 10 to give you 0, then it has the character value '1' added and '0' subtracted, effectively adding 1 to give 1.
On the second iteration, that's multiplied by 10 to give you 10, then it has the character value '2' added and '0' subtracted, effectively adding 2 to give 12.
On the third iteration, that's multiplied by 10 to give you 120, then it has the character value '3' added and '0' subtracted, effectively adding 3 to give 123.
Voila! There you have it, 123.
If you change the code to look like:
#include <iostream>
char buf[] = "012345678901234567";
void someFunction(long long *key_num) {
std::cout << *key_num << std::endl;
for (int i = 0; i < 18; i++) {
*key_num = *key_num * 10 + (buf[i] - '0');
std::cout << *key_num << std::endl;
}
}
int main (void) {
long long x = 0;
someFunction (&x);
return 0;
}
then you should see it in action (I had to change your value from the 17-character array you provided in your comment to an 18-character one, otherwise you'd get some problems when you tried to use the character beyond the end; I also had to change to a long long because my longs weren't big enough):
0
0
1
12
123
1234
12345
123456
1234567
12345678
123456789
1234567890
12345678901
123456789012
1234567890123
12345678901234
123456789012345
1234567890123456
12345678901234567
As a shorter example with the number 1234, that can be thought of as:
1000 * 1 + 100 * 2 + 10 * 3 + 4
Or:
10 * (10 * (10 * 1 + 2) + 3) + 4
The first time through the loop, *key_num would be 1. The second time it is multiplied by 10 and 2 added (ie 12), the third time multiplied by 10 and 3 added (ie 123), the fourth time multiplied by 10 and 4 added (ie 1234).
It just multiples the current long value (*key_num) by 10, adds the digit value, then stores the result again.
EDIT: It's not bit-shifting anything. It's just math. You can imagine it as shifting decimal digits, but it's binary internally.
key_num = 0 (0)
key_num = key_num * 10 + ('0' - '0') (0)
key_num = key_num * 10 + ('1' - '0') (1)
key_num = key_num * 10 + ('2' - '0') (12)