Counting Amount of subsets from a set

Counting Amount of subsets from a set - combinations

Let set S = {a1, a2, a3, ..., a12}, where all 12 elements are distinct. We wish to form subsets, each of which contains one or more of the elements of set S (including the possibility of using all the elements of S). The only restriction is that the subscript of each element in a specific set must be an integer multiple of the smallest subscript in that set. For example, {a2, a4, a6} is an acceptable set, as is {a6}. How many such sets can be formed?
I was thinking of dividing the problem in cases, one for 3 elements, and then 4 and 5 and so on. This would take way too long so can I have help doing it a faster way?

The constraint relates to the smallest subscript, so it makes more sense to branch on that, rather than branching on the size of the subset. You want to branch on something which simplifies the constraint, so instead of "each subscript is a multiple of the smallest subscript" you have e.g. "each subscript is a multiple of 3". Then within each branch it's much easier to count.
You only want to count non-empty subsets, so every subset has a smallest subscript:
If the subset contains a1, then each of the 11 remaining elements can be freely chosen as either present or not present. There are 211 such subsets.
If the smallest subscript is a2, then there are 5 possible other elements a4, a6, a8, a10, a12, each of which can be freely chosen as either present or not present. There are 25 such subsets.
If the smallest subscript is a3, then there are 3 possible other elements a6, a9, a12. There are 23 such subsets.
If the smallest subscript is a4, then there are 2 possible other elements a8, a12. There are 22 such subsets.
If the smallest subscript is a5 or a6, then there is one possible other element (a10 or a12 respectively). There are 2 subsets for a5 and 2 for a6.
Otherwise, the smallest subscript is a7, a8, a9, a10, a11, or a12, and the subset cannot contain any other elements due to the restriction. There is one subset in each case.
The total is therefore 211 + 25 + 23 + 22 + 2 + 2 + 1 + 1 + 1 + 1 + 1 + 1 = 2,102.
Generalising this argument, if there are n elements in the original set, then the total number of subsets satisfying this constraint is:

Related

Cell value increasing by 1 when value in another cell reaches 100

In Google Sheets is it possible to have the value in cell A1 to increase by 1 when the value in B1 reaches 100 and also changing the value in B1 to -100?
So for example, "120" in B1 would change the value of A1 to "1" and change the value of B1 to "20".
Basically, I am looking to use the value of A1 as a whole number and the value in B1 as the decimal place but with a max of 99 on the decimal place.
Update following request for same but with varying max decimal numbers:
https://docs.google.com/spreadsheets/d/193Vbg9Dm4qDjIx4PNcmTOkMyPLdDaAr5-HaFkD1V2Eg/edit?usp=sharing
Columns A-C are input columns
Columns F-I are to work out the totals of the input columns for the relevant people and then to concatenate as a decimal figure
Columns K-N are updated totals using #player0's formula based on the max decimal place being 99 before it increases the whole number by 1 at 100.
So using the 44.120 total as an example, using decimal maximums of 50, 80 & 90 for when the whole number is changed:
For 50 - 44.120 would become 46.20
For 80 - 44.120 would become 45.40
For 90 - 44.120 would become 45.30

try:
=ARRAYFORMULA(SPLIT(REGEXREPLACE("0000"&A1:A6, "(.+)(.{2})$", "$1×$2"), "×"))

What want is not 100% able to be completed, but this is:
A
B
C
1
75
175
0
75
75
=IF(C1>100, LEFT(C1,1), "0")
=IF(C1>100, RIGHT(C1, 2), C1)
Input Number Here...

Maintain Cell Value If Condition is True

What formula can I use to maintain a cell value (Integer) in excel if a certain condition is True?
*Example;
A6=5
A7=4
A8= A6 * A7 - which will give a value of 20 (in Cell A8) in this case*.
How can I maintain the 20 in cell A8 if the value of A6 was to be greater than 10 but allow is to change accordingly if it's less than 10?

Concatenation of prefixes of a boolean array

I have a boolean array A of size n that I want to transform in the following way : concatenate every prefix of size 1,2,..n.
For instance, for n=5, it will transform "abcde" into "aababcabcdabcde".
Of course, a simple approach can loop over every prefix and use elementary bit operations (shift, mask, plus); so the complexity of this approach is obviously O(n).
There are some well known tricks regarding bit manipulations like here.
My question is: is it possible to achieve a faster algorithm for the transformation described above with a complexity better than O(n) by using bit manipulations ?
I am aware that the interest of such an improvement may be just theoretical because the simple approach could be the fastest in practice, but I am still curious about the fact that there exists a theoretical improvement or not.
As a precision, I need to perform this transformation p times, where p can be much bigger than n; some pre-computations could be done for a given n and used later for the computation of the p transformations.

I'm not sure if this is what you're looking for, but here is a different algorithm, which may be interesting depending on your assumptions.
First compute two masks that depend only on n, so for any particular n these are just constants:
C (copy mask), a mask that has every n'th bit set and is n² bits long. So for n = 5, C = 0000100001000010000100001. This will be used to create n copies of A concatenated together.
E (extract mask), a mask that indicates which bits to take from the big concatenation, which is built up from n times a block of n bits, with values 1, 3, 7, 15 ... eg for n = 5, E = 1111101111001110001100001. Pad the left with zeroes if necessary.
Then the real computation that takes an A and constructs the concatenation of prefixes is:
pext(A * C, E)
Where pext is compress_right, discarding the bits for which the extraction mask is 0 and compacting the remaining bits to the right.
The multiplication can be replaced by a "doubling" technique like this: (which can also be used to compute the C mask)
l = n
while l < n²:
A = A | (A << l)
l = l * 2
Which in general produces too many concatenated copies of A but you can just pretend the excess isn't there by not looking at it (the pext drops any excess input anyway). Efficiently producing E for an unknown and arbitrarily large n seems harder, but that's not really a use case for this approach anyway.
The actual time complexity depends on your assumptions, but of course in the full "arbitrary bit count" setting both multiplication and compaction are heavyweight operations and the fact that the output has a size quadratic in input size really doesn't help. For small enough n such that eg n² <= 64 (depending on word size), so everything fits in a machine word, it all works out well (even for unknown n, since all the required masks can be precomputed). On the other hand, for such small n it is also feasible to table the entire thing, doing a lookup for a pair (n, A).

I may have found another way to proceed.
The idea is to use multiplication to propagate the initial input I to the correct position. The multiplication coefficient J is the vector whose bits are set to one at position i*(i-1)/2 for i in [1:n].
However, a direct multiplication I by J will provide many unwanted terms, so the idea is to
mask some bits from vectors I and J
multiply these masked vectors
remove some junk bits from the result.
We have thus several iterations to do; the final result is the sum of the intermediate results. We can write the result as "sum on i of ((I & Ai) * Bi) & Ci", so we have 2 masks and 1 multiplication per iteration (Ai, Bi and Ci are constants depending on n).
This algorithm seems to be O(log(n)), so it is better than O(log(n)^2) BUT it needs multiplication which may be costly. Note also that this algorithm requires registers of size n*(n+1)/2 which is better than n^2.
Here is an example for n=7
Input:
I = abcdefg
We set J = 1101001000100001000001
We also note temporary results:
Xi = I & Ai
Yi = Xi * Bi
Zi = Yi & Ci
iteration 1
----------------------------
1 A1
11 1 1 1 1 1 B1
11 1 1 1 1 1 C1
----------------------------
a X1
aa a a a a a Y1
aa a a a a a Z1
iteration 2
----------------------------
11 A2
1 1 1 1 1 1 B2
1 11 11 11 11 11 C2
----------------------------
bc X2
bcbc bc bc bc bc Y2
b bc bc bc bc bc Z2
iteration 3
----------------------------
1111 A3
1 1 1 1 B3
1 11 111 1111 C3
----------------------------
defg X3
defgdefg defg defg Y3
d de def defg Z3
FINAL SUM
----------------------------
aa a a a a a Z1
b bc bc bc bc bc Z2
d de def defg Z3
----------------------------
aababcabcdabcdeabcdefabcdefg SUM

Intersection between vectors [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have data in the following form.
vector<pair<unsigned,unsigned> > vecA; //here first value denotes reolution and second value denotes value. All values are of 4 bits
vecA.push_back(make_pair(2,2)); vecA.push_back(make_pair(2,3)); vecA.push_back(make_pair(3,6)); vecA.push_back(make_pair(3,7)); vecA.push_back(make_pair(4,5));
(2,2)-> signifies that the first 2 bits of value(a 4 bit number) are 10. i.e. the value could be "1000,1001,1010,1011" in binary
(2,3)-> signifies that the first 2 bits of value(a 4 bit number) are 11 i.e. the value could be "1100,1101,1110, 1011" in binary
(3,6)-> signifies that the first 3 bits of value(a 4 bit number) are 110 i.e., the value could be "1100,1101" in binary
(3,7)-> signifies that the first 3 bits of value(a 4 bit number) are 111 i.e., the value could be "1110,1111" in binary
(4,5)-> signifies that the first 4 bits of value(a 4 bit number) are 0101 i.e., the value is "0101" in binary
I have another vector containing the following:
vector<unsigned> vecB; //vecB has a by default resolution of 4. Here too the values are of 4 bits
vecB.push_back(10); vecB.push_back(6); vecB.push_back(13); vecB.push_back(12); vecB.push_back(15); vecB.push_back(5); vecB.push_back(7);
10-> signifies that the 4 bit number is: "1010"
6-> signifies that the 4 bit number is: "0110"
13-> signifies that the 4 bit number is: "1101"
12-> signifies that the 4 bit number is: "1100"
15-> signifies that the 4 bit number is: "1111", etc.
Now the intersection between vecA and vecB should perform a bit level comparison i.e. for 2 bit resolution of vecA just the first two bits of vecB should be seen.
i.e. (2,2) of vecA matches with "10" of vecB
(2,3) of vecA matches with "13,12,15" of vecB
(3,6) of vecA matches with "12,13" of vecB
(3,7) of vecA matches with "15" of vecB
(4,5) matches with "5" of vecB
The intersection should only return the matching values from vecB. i.e. the intersection should return "10,13,12,15,5" as the result.
How can I perform this intersection efficiently in c++?
vector<unsigned> ans;
for(vector<pair<unsigned,unsigned> >::iterator i1=vecA.begin(), l1=vecA.end(); i1!=l1;++i1)
{
for(vector<unsigned>::iterator i2=vecB.begin(),l2=vecB.end();i2!=l2;++i2)
{
if(((*i2)>>(*i1).first)==(*i1).second)
ans.push_back((*i1).second);
}
}

(2,2) represents 10??, where we don't care what ?? are. This is the half-open range 1000 through 1100, aka [2 << 2, (2+1)<<2).
So, produce a set of ranges from the LHS. Anything that overlaps, fuze. You'll have a set of start/finish intervals.
Now sort the RHS. Next, walk through it, keeping track when you enter/exit the LHS intervals. Those that are in the LHS intervals are in the intersection.
The RHS sorting takes O(|RHS| lg |RHS|). The walking takes O(|RHS| + |LHS|).
Making the LHS intervals takes O(|LHS| lg |LHS|) time (including time to sort by start-of-interval). Merging them is a single pass, also O(|LHS|).
So the end result is O(|RHS| lg |RHS| + |LHS| lg |LHS|) time to calculate intersection, instead of O(|RHS| * |LHS|) of your solution above.

Data structure for fast range searches of dense dataset 4D vectors

I have millions of unstructured 3D vectors associated with arbitrary values - making for a set 4D of vectors. To make it simpler to understand: I have unixtime stamps associated with hundreds of thousands of 3D vectors. And I have many time stamps, making for a very large dataset; upwards of 30 millions vectors.
I have the need to search particular datasets of specific time stamps.
So lets say I have the following data:
For time stamp 1407633943:
(0, 24, 58, 1407633943)
(9, 2, 59, 1407633943)
...
For time stamp 1407729456:
(40, 1, 33, 1407729456)
(3, 5, 7, 1407729456)
...
etc etc
And I wish to make a very fast query along the lines of:
Query Example 1:
Give me vectors between:
X > 4 && X < 9 && Y > -29 && Y < 100 && Z > 0.58 && Z < 0.99
Give me list of those vectors, so I can find the timestamps.
Query Example 2:
Give me vectors between:
X > 4 && X < 9 && Y > -29 && Y < 100 && Z > 0.58 && Z < 0.99 && W (timestamp) = 1407729456
So far I've used SQLite for the task, but even after column indexing, the thing takes between 500ms - 7s per query. I'm looking for somewhere between 50ms-200ms per query solution.
What sort of structures or techniques can I use to speed the query up?
Thank you.

kd-trees can be helpful here. Range search in a kd-tree is a well-known problem. Time complexity of one query depends on the output size, of course(in the worst case all tree will be traversed if all vectors fit). But it can work pretty fast on average.

I would use octree. In each node I would store arrays of vectors in a hashtable using the timestamp as a key.
To further increase the performance you can use CUDA, OpenCL, OpenACC, OpenMP and implement the algorithms to be executed in parallel on the GPU or a multi-core CPU.

BKaun: please accept my attempt at giving you some insight into the problem at hand. I suppose you have thought of every one of my points, but maybe seeing them here will help.
Regardless of how ingest data is presented, consider that, using the C programming language, you can reduce the storage size of the data to minimize space and search time. You will be searching for, loading, and parsing single bits of a vector instead of, say, a SHORT INT which is 2 bytes for every entry - or a FLOAT which is much more. The object, as I understand it, is to search the given data for given values of X, Y, and Z and then find the timestamp associated with these 3 while optimizing the search. My solution does not go into the search, but merely the data that is used in a search.
To illustrate my hints simply, I'm considering that the data consists of 4 vectors:
X between -2 and 7,
Y between 0.17 and 3.08,
Z between 0 and 50,
timestamp (many of same size - 10 digits)
To optimize, consider how many various numbers each vector can have in it:
1. X can be only 10 numbers (include 0)
2. Y can be 3.08 minus 0.17 = 2.91 x 100 = 291 numbers
3. Z can be 51 numbers
4. timestamp can be many (but in this scenario,
you are not searching for a certain one)
Consider how each variable is stored as a binary:
1. Each entry in Vector X COULD be stored in 4 bits, using the first bit=1 for
the negative sign:
7="0111"
6="0110"
5="0101"
4="0100"
3="0011"
2="0010"
1="0001"
0="0000"
-1="1001"
-2="1010"
However, the original data that you are searching through may range
from -10 to 20!
Therefore, adding another 2 bits gives you a table like this:
-10="101010"
-9="101001" ...
...
-2="100010"
-1="100001" ...
...
8="001000"
9="001001" ...
...
19="001001"
20="010100"
And that's only 6 bits to store each X vector entry for integers from -10 to 20
For search purposes on a range of -10 to 20, there are 21 different X Vector entries
possible to search through.
Each entry in Vector Y COULD be stored in 9 bits (no extra sign bit is needed)
The 1's and 0's COULD be stored (accessed, really) in 2 parts
(tens place, and a 2 digit decimal).
Part 1 can be 0, 1, 2, or 3 (4 2-place bits from "00" to "11")
However, if the range of the entire Y dataset is 0 to 10,
part 1 can be 0, 1, ...9, 10 (which is 11 4-place bits
from "0000" to "1010"
Part 2 can be 00, 01,...98, 99 (100 7-place bits from "0000000" to "1100100"
Total storage bits for Vector Y entries is 11 + 7 = 18 bits in the
range 00.00 to 10.99
For search purposes on a range 00.00 to 10.99, there are 1089 different Y Vector
entries possible to search through (11x99) (?)
Each entry in Vector Z in the range of 0 to 50 COULD be stored in 6 bits
("000000" to "110010").
Again, the actual data range may be 7 bits long (for simplicity's sake)
0 to 64 ("0000000" to "1000000")
For search purposes on a range of 0 to 64, there are 65 different Z Vector entries
possible to search through.
Consider that you will be storing the data in this optimized format, in a single
succession of bits:
X=4 bits + 2 range bits = 6 bits
+ Y=4 bits part 1 and 7 bits part 2 = 11 bits
+ Z=7 bits
+ timestamp (10 numbers - each from 0 to 9 ("0000" to "1001") 4 bits each = 40 bits)
= TOTAL BITS: 6 + 11 + 7 + 40 = 64 stored bits for each 4D vector
THE SEARCH:
Input xx, yy, zz to search for in arrays X, Y and Z (which are stored in binary)
Change xx, yy, and zz to binary bit strings per optimized format above.
function(xx, yy, zz)
Search for X first, since it has 21 possible outcomes (range is -10 to 10)
- the lowest number of any array
First search for positive targets (there are 8 of them and better chance
of finding one)
These all start with "000"
7="000111"
6="000110"
5="000101"
4="000100"
3="000011"
2="000010"
1="000001"
0="000000"
So you can check if the first 3 bits = "000". If so, you have a number
between 0 and 7.
Found: search for Z
Else search for xx=-2 or -1: does X = -2="100010" or -1="100001" ?
(do second because there are only 2 of them)
Found: Search for Z
NotFound: next X
Search for Z after X is Found: (Z second, since it has 65 possible outcomes
- range is 0 to 64)
You are searching for 6 bits of a 7 bit binary number
("0000000" to "1000000") If bits 1,2,3,4,5,6 are all "0", analyze bit 0.
If it is "1" (it's 64), next Z
Else begin searching 6 bits ("000000" to "110010") with LSB first
Found: Search for Y
NotFound: Next X
Search for Y (Y last, since it has 1089 possible outcomes - range is 0.00 to 10.99)
Search for Part 1 (decimal place) bits (you are searching for
"0000", "0001" or "0011" only, so use yyPt1=YPt1)
Found: Search for Part 2 ("0000000" to "1100100") using yyPt2=YPt2
(direct comparison)
Found: Print out X, Y, Z, and timestamp
NotFound: Search criteria for X, Y, and Z not found in data.
Print X,Y,Z,"timestamp not found". Ask for new X, Y, Z. New search.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js