Mathematica collecting elements in a list - tuples

Given the 256 tuples generated from:
Tuples[{a,b,c,d},4] = {{a,a,a,a},{a,a,a,b}...,{d,d,d,d}}
I would like to filter all of the tuples that have exactly 3 of a kind. For example, I want to keep {c,b,c,c} & {a,a,d,a} etc.. but not {d,d,d,d} or {a,b,b,c}.
I know there are:
Binomial[4,3]*4*3 = 48
such tuples from simple maths. But I am looking for a programmatic way of counting these.
My final goal is from the tuples:
Tuples[{1,2,3,...,n},k]
I would like to know how many of those tuples have exactly one subset with m of a kind, with all other subgroups of a kind having size less than m.
In case you are interested, this problem spawned from asking: What is the average number of rounds played before there is a winner in the game "Cards Against Humanity"? Assuming we have n players and the first person with x cards wins.

This will find your 48 tuples
Select[Tuples[{a, b, c, d}, 4],
MatchQ[Sort[#], {a_, a_, a_, b_} | {b_, a_, a_, a_}] &&
Length[Union[#]] != 1 &]
This will show you the tuples of four items over 1,...,6 with m identical items and all other items appearing less than m times.
m = 2;
f[v_] := Module[{runlens},
runlens = Sort[Map[Length, Split[Sort[v]]]];
runlens[[-1]] == m && If[Length[runlens] == 1, True, runlens[[-2]] < m]]
];
Select[Tuples[Range[6], 4], f]
Use Count on that result and you know how many you have.

Another approach:
Select[ Tuples[{a, b, c, d}, 4] ,
((Count[#, 3] == 1 && Max[#] == 3) &#Tally[#][[All, 2]] ) & ]
Of course if the set size is greater than half the list length it is redundant to check both Max and Count

Related

Looking for hints to solve this dynamic programming problem

I am trying to improve my problem solving skills for programming interviews and am trying to solve this problem. I have a feeling it can be solved using dynamic programming but the recursive relationship is not obvious to me.
To select the first three choir singers I simply use brute force. Since there are only 20 Choose 3 = 1140 ways to pick them. At first I thought dp[a][b][c] could represent the shortest song with three choir singers with remaining breath a, b, c. If I could calculate this using dp[a][b][c] = 1 + dp[a - 1][b - 1][c - 1], but what should be done when any of the indices equal 0, which choir singer should be substituted in. Additionally, we cannot reuse the dp array, because say in one instance we start with choir singers with breath a, b, c and in the second instance d, e, f. Once the first instance has been calculated and the dp array filled; the second instance may need to use dp[i][j][k] computed by the first instance. Since this value depends on the available choir singers in the first instance, and the available singers in both instances are not the same, dp[i][j][k] may not be possible in the second instance. This is because the shortest song length dp[i][j][k] may use choir singers which in the second instance are already being used.
I am out of ideas to tackle this problem and there is no solution anywhere. Could someone give me some hints to solve it?
Problem statement
We have N singers, who each have a certain time they can sing for and need 1 second to recover once out of breath. What is the minimum song they can sing, where three singers are singing at all times and where they all three finish singing simultaneously?
Input:
Input
3 < N <= 20
N integers Fi (1 <= Fi <= 10, for all 1 <= i <= N)
Here is the idea.
At each point in the singing, the current state can be represented by who the singers are, how long they have been singing, and which ones are currently out of breath. And from each state we need to transition to a new state, which is every singer out of breath is ready to sing again, every singer singing is good for one less turn, and new singers might be chosen.
Done naively, there are up to 20 choose 3 singers, each of which can be in 10 current states, plus up to 2 more who are out of breath. This is 175560000 combined states you can be in. That's too many, we need to be more clever to make this work.
Being more clever, we do not have 20 differentiable singers. We have 10 buckets of singers based on how long they can sing for. If a singer can sing for 7 turns, they can't be in 10 states if currently singing, but only 7. We do not care whether the two can sing for 7 turns are at 4 and 3 turns left or 3 and 4, they are the same. This introduces a lot of symmetries. Once we take care of all of the symmetries, that reduces the number of possible states that we might be in from hundreds of millions to (usually) tens of thousands.
And now we have a state transition for our DP which is dp[state1] to dp[state2]. The challenge being to produce a state representation that takes advantage of these symmetries that you can use as keys to your data structure.
UPDATE:
The main loop of the code would look like this Python:
while not finished:
song_length += 1
next_states = set()
for state in current_states:
for next_state in transitions(state):
if is_finished(next_state):
finished = True # Could break out of loops here
else:
next_states.add(next_state)
current_states = next_states
Most of the challenge is a good representation of a state, and your transitions function.
The state in terms of memoisation seems unrelated to the time elapsed since the start. Take any starting position,
a, b, c
where a, b, c are chosen magnitudes (how long each singer can hold their breath), and a is the smallest magnitude. We have
a, b, c
t = 0
and it's the same as:
0, b - a, c - a
t = a
So let's define the initial state with smallest magnitude a as:
b, c, ba, ca
where ba = b - a
ca = c - a
t = a
From here, every transition of the state is similar:
new_a <- x
where x is a magnitude in
the list that can be available
together with b and c. (We only
need to try each such unique
magnitude once during this
iteration. We must also prevent
a singer from repeating.)
let m = min(new_a, ba, ca)
then the new state is:
u, v, um, vm
t = t + m
where u and v are from the
elements of [new_a, b, c] that
aren't associated with m, and um
and vm are their pairs from
[new_a, ba, ca] that aren't m,
subtracted by m.
The state for memoisation of visited combinations can be only:
[(b, ba), (c, ca)] sorted by
the tuples' first element
with which we can prune a branch in the search if the associated t that is reached is equal or higher to the minimal one seen for that state.
Example:
2 4 7 6 5
Solution (read top-down):
4 5 6
7 4 5
2
States:
u v um vm
5 6 1 2
t = 4
new_a = 7
m = min(7, 1, 2) = 1 (associated with 5)
7 6 6 1
t = 5
new_a = 4
m = min(4, 6, 1) = 1 (associated with 6)
4 7 3 5
t = 6
new_a = 5
m = min(5, 3, 5) = 3 (associated with 4)
5 7 2 2
t = 9
new_a = 2
m = min(2, 2, 2) = 2 (associated with 2)
5 7 0 0
t = 11
Python code:
import heapq
from itertools import combinations
def f(A):
mag_counts = {}
for x in A:
if x in mag_counts:
mag_counts[x] = mag_counts[x] + 1
else:
mag_counts[x] = 1
q = []
seen = set()
# Initialise the queue with unique starting combinations
for comb in combinations(A, 3):
sorted_comb = tuple(sorted(comb))
if not sorted_comb in seen:
(a, b, c) = sorted_comb
heapq.heappush(q, (a, (b-a, b), (c-a, c), a))
seen.add(sorted_comb)
while q:
(t, (ba, b), (ca, c), prev) = heapq.heappop(q)
if ba == 0 and ca == 0:
return t
for mag in mag_counts.keys():
# Check that the magnitude is available
# and the same singer is not repeating.
[three, two] = [3, 2] if mag != prev else [4, 3]
if mag == b == c and mag_counts[mag] < three:
continue
elif mag == b and mag_counts[mag] < two:
continue
elif mag == c and mag_counts[mag] < two:
continue
elif mag == prev and mag_counts[mag] < 2:
continue
m = min(mag, ba, ca)
if m == mag:
heapq.heappush(q, (t + m, (ba-m, b), (ca-m, c), m))
elif m == ba:
heapq.heappush(q, (t + m, (mag-m, mag), (ca-m, c), b))
else:
heapq.heappush(q, (t + m, (mag-m, mag), (ba-m, b), c))
return float('inf')
As = [
[3, 2, 3, 3], # 3
[1, 2, 3, 2, 4], # 3
[2, 4, 7, 6, 5] # 11
]
for A in As:
print A, f(A)

Find the same numbers between [a,b] intervals

Suppose I have 3 array of consecutive numbers
a = [1, 2, 3]
b = [2, 3, 4]
c = [3, 4]
Then the same number that appears in all 3 arrays is 3.
My algorithm is to use two for loops in each other to check for the same array and push it in another array (let's call it d). Then
d = [2, 3] (d = a overlap b)
And use it again to check for array d and c => The final result is 1, cause there are only 1 numbers that appears in all 3 arrays.
e = [3] (e = c overlap d) => e.length = 1
Other than that, if there exists only 1 array, then the algo should return the length of the array, as all of its numbers appear in itself. But I think my said algo above would take too long because the numbers of array can go up to 10^5. So, any idea of a better algorithm?
But I think my said algo above would take too long because the numbers of array can go up to 105. So, any idea of a better algorithm?
Yes, since these are ranges, you basically want to calculate the intersection of the ranges. This means that you can calculate the maximum m of all the first elements of the lists, and the minimum n of all the last elements of the list. All the numbers between m and n (both inclusive) are then members of all lists. If m>n, then there are no numbers in these lists.
You do not need to calculate the overlap by enumerating over the first list, and check if these are members of the last list. Since these are consecutive numbers, we can easily find out what the overlap is.
In short, the overlap of [a, ..., b] and [c, ..., d] is [ max(a,c), ..., min(b,d) ], there is no need to check the elements in between.

Algorithm to get best combination

I have items with ID 1, 3, 4, 5, 6, 7. Now I have data like following.
There is an offerId for each row. Array of Ids consist of combination of the ID in an array. Discount is the value for that offerId
offerId : Array of Ids : Discount
o1 : [1] : 45
o2 : [1 3 4] : 100
o3 : [3 5] : 55
o4 : [5] : 40
o5 : [6] : 30
o6 : [6 7] : 20
Now I have to select all the offerIds which give me best combination of Ids i.e. maximum total discount.
For example in above case : possible results can be:
[o2, o4, o5] maximum discount is 170(100 + 40 + 30).
Note. the result offerId should be such that Ids don't repeat. Example for o2,o4,o6 ids are [1,3,4], [5], [6] all are distinct.
Other combination can be :
o1, o3, 06 for which ids are [1], [3,5], [6,7] However the total is 120(45+55+20) which is less then 170 as in previous case.
I need an algorithm/code which will help me to identify combination of offerIds which will give maximum discount , considering that each offer should contain distinct Ids.
NOTE I am writing my code in go language. But solutions/Logic in any language will be helpful.
NOTE : I hope I am able to explain my requirement properly. please comment if any extra information is required. Thanks.
Here is a dynamic programming solution which, for every possible subset of IDs, finds the combination of offers for which the discount is maximum possible.
This will be pseudocode.
Let our offers be structures with fields offerNumber, setOfItems and discount.
For the purposes of implementation, we first renumerate the possible items by integers from zero to number of different possible items (say k) minus one.
After that, we can represent setOfItems by a binary number of length k.
For example, if k = 6 and setOfItems = 1011102, this set includes items 5, 3, 2 and 1 and excludes items 4 and 0, since bits 5, 3, 2 and 1 are ones and bits 4 and 0 are zeroes.
Now let f[s] be the best discount we can get using exactly set s of items.
Here, s can be any integer between 0 and 2k - 1, representing one of the 2k possible subsets.
Furthermore, let p[s] be the list of offers which together allow us to get discount f[s] for the set of items s.
The algorithm goes as follows.
initialize f[0] to zero, p[0] to empty list
initialize f[>0] to minus infinity
initialize bestF to 0, bestP to empty list
for each s from 0 to 2^k - 1:
for each o in offers:
if s & o.setOfItems == o.setOfItems: // o.setOfItems is a subset of s
if f[s] < f[s - o.setOfItems] + o.discount: // minus is set subtraction
f[s] = f[s - o.setOfItems] + o.discount
p[s] = p[s - o.setOfItems] append o.offerNumber
if bestF < f[s]:
bestF = f[s]
bestP = p[s]
After that, bestF is the best possible discount, and bestP is the list of offers which get us that discount.
The complexity is O (|offers| * 2k) where k is the total number of items.
Here is another implementation which is asymptotically the same, but might be faster in practice when most subsets are unreachable.
It is "forward" instead of "backward" dynamic programming.
initialize f[0] to zero, p[0] to empty list
initialize f[>0] to -1
initialize bestF to 0, bestP to empty list
for each s from 0 to 2^k - 1:
if f[s] >= 0: // only for reachable s
if bestF < f[s]:
bestF = f[s]
bestP = p[s]
for each o in offers:
if s & o.setOfItems == 0: // s and o.setOfItems don't intersect
if f[s + o.setOfItems] < f[s] + o.discount: // plus is set addition
f[s + o.setOfItems] = f[s] + o.discount
p[s + o.setOfItems] = p[s] append o.offerNumber

Finding missing number using binary search

I am reading book on programming pearls.
Question: Given a sequential file that contains at most four billion
32 bit integers in random order, find a 32-bit integer that isn't in
the file (and there must be at least one missing). This problem has to
be solved if we have a few hundred bytes of main memory and several
sequential files.
Solution: To set this up as a binary search we have to define a range,
a representation for the elements within the range, and a probing
method to determine which half of a range holds the missing integer.
How do we do this?
We'll use as the range a sequence of integers known to contain atleast
one missing element, and we'll represent the range by a file
containing all the integers in it. The insight is that we can probe a
range by counting the elements above and below its midpoint: either
the upper or the lower range has atmost half elements in the total
range. Because the total range has a missing element, the smaller half
must also have a mising element. These are most ingredients of a
binary search algorithm for above problem.
Above text is copy right of Jon Bently from programming pearls book.
Some info is provided at following link
"Programming Pearls" binary search help
How do we search by passes using binary search and also not followed with the example given in above link? Please help me understand logic with just 5 integers rather than million integers to understand logic.
Why don't you re-read the answer in the post "Programming Pearls" binary search help. It explains the process on 5 integers as you ask.
The idea is that you parse each list and break it into 2 (this is where binary part comes from) separate lists based on the value in the first bit.
I.e. showing binary representation of actual numbers
Original List "": 001, 010, 110, 000, 100, 011, 101 => (broken into)
(we remove the first bit and append it to the "name" of the new list)
To form each of the bellow lists we took values starting with [0 or 1] from the list above
List "0": 01, 10, 00, 11 (is formed from subset 001, 010, 000, 011 of List "" by removing the first bit and appending it to the "name" of the new list)
List "1": 10, 00, 01 (is formed from subset 110, 100, 101 of List "" by removing the first bit and appending it to the "name" of the new list)
Now take one of the resulting lists in turn and repeat the process:
List "0" becomes your original list and you break it into
List "0***0**" and
List "0***1**" (the bold numbers are again the 1 [remaining] bit of the numbers in the list being broken)
Carry on until you end up with the empty list(s).
EDIT
Process step by step:
List "": 001, 010, 110, 000, 100, 011, 101 =>
List "0": 01, 10, 00, 11 (from subset 001, 010, 000, 011 of the List "") =>
List "00": 1, 0 (from subset 01, 00 of the List "0") =>
List "000": 0 [final result] (from subset 0 of the List "00")
List "001": 1 [final result] (from subset 1 of the List "00")
List "01": 0, 1 (from subset 10, 11 of the List "0") =>
List "010": 0 [final result] (from subset 0 of the List "01")
List "011": 1 [final result] (from subset 1 of the List "01")
List "1": 10, 00, 01 (from subset 110, 100, 101 of the List "") =>
List "10": 0, 1 (from subset 00, 01 of the List "1") =>
List "100": 0 [final result] (from subset 0 of the List "10")
List "101": 1 [final result] (from subset 1 of the List "10")
List "11": 0 (from subset 10 of the List "1") =>
List "110": 0 [final result] (from subset 0 of the List "11")
List "111": absent [final result] (from subset EMPTY of the List "11")
The positive of this method is that it will allow you to find ANY number of missing numbers in the set - i.e. if more than one is missing.
P.S. AFAIR for 1 single missing number out of the complete range there is even more elegant solution of XOR all numbers.
The idea is to solve easier problem:
Is the missing value in range [minVal, X] or (X, maxVal).
If you know this, you can move X and check again.
For example, you have 3, 4, 1, 5 (2 is missing).
You know that minVal = 1, maxVal = 5.
Range = [1, 5], X = 3, there should be 3 integers in range [1, 3] and 2 in range [4, 5]. There are only 2 in range [1, 3], so you are looking in range [1, 3]
Range = [1, 3], X = 2. There are only 1 value in range [1, 2], so you are looking in range [1, 2]
Range = [1, 2], X = 1. There are no values in range [2, 2] so it is your answer.
EDIT: Some pseudo-C++ code:
minVal = 1, maxVal = 5; //choose correct values
while(minVal < maxVal){
int X = (minVal + maxVal) / 2
int leftNumber = how much in range [minVal, X]
int rightNumber = how much in range [X + 1, maxVal]
if(leftNumber < (X - minVal + 1))maxVal = X
else minVal = X + 1
}
Here's a simple C solution which should illustrate the technique. To abstract away any tedious file I/O details, I'm assuming the existence of the following three functions:
unsigned long next_number (void) reads a number from the file and returns it. When called again, the next number in the file is returned, and so on. Behavior when the end of file is encountered is undefined.
int numbers_left (void) returns a true value if there are more numbers available to be read using next_number(), false if the end of the file has been reached.
void return_to_start (void) rewinds the reading position to the start of the file, so that the next call to next_number() returns the first number in the file.
I'm also assuming that unsigned long is at least 32 bits wide, as required for conforming ANSI C implementations; modern C programmers may prefer to use uint32_t from stdint.h instead.
Given these assumptions, here's the solution:
unsigned long count_numbers_in_range (unsigned long min, unsigned long max) {
unsigned long count = 0;
return_to_start();
while ( numbers_left() ) {
unsigned long num = next_number();
if ( num >= min && num <= max ) {
count++;
}
}
return count;
}
unsigned long find_missing_number (void) {
unsigned long min = 0, max = 0xFFFFFFFF;
while ( min < max ) {
unsigned long midpoint = min + (max - min) / 2;
unsigned long count = count_numbers_in_range( min, midpoint );
if ( count < midpoint - min + 1 ) {
max = midpoint; // at least one missing number below midpoint
} else {
min = midpoint; // no missing numbers below midpoint, must be above
}
}
return min;
}
One detail to note is that min + (max - min) / 2 is the safe way to calculate the average of min and max; it won't produce bogus results due to overflowing intermediate values like the seemingly simpler (min + max) / 2 might.
Also, even though it would be tempting to solve this problem using recursion, I chose an iterative solution instead for two reasons: first, because it (arguably) shows more clearly what's actually being done, and second, because the task was to minimize memory use, which presumably includes the stack too.
Finally, it would be easy to optimize this code, e.g. by returning as soon as count equals zero, by counting the numbers in both halves of the range in one pass and choosing the one with more missing numbers, or even by extending the binary search to n-ary search for some n > 2 to reduce the number of passes. However, to keep the example code as simple as possible, I've left such optimizations unmade. If you like, you may want to, say, try modifying the code so that it requires at most eight passes over the file instead of the current 32. (Hint: use a 16-element array.)
Actually, if we have range of integers from a to b. Sample: [a..b].
And in this range we have b-a integers. It means, that only one is missing.
And if only one is missing, we can calculate result using only single cycle.
First we can calculate sum of all integers in range [a..b], which equals:
sum = (a + b) * (b - a + 1) / 2
Then we calcualate summ of all integers in our sequence:
long sum1 = 0;
for (int i = 0; i < b - a; i++)
sum1 += arr[i];
Then we can find missing element as difference of those two sums:
long result = sum1 - sum;
when you've seen 2^31 zeros or ones in the ith digit place then your answer has a one or zero in the ith place. (Ex: 2^31 ones in 5th binary position means the answer has a zero in the 5th binary position.
First draft of c code:
uint32_t binaryHistogram[32], *list4BILLION, answer, placesChecked[32];
uint64_t limit = 4294967296;
uint32_t halfLimit = 4294967296/2;
int i, j, done
//General method to point to list since this detail is not important to the question.
list4BILLION = 0000000000h;
//Initialize array to zero. This array represents the number of 1s seen as you parse through the list
for(i=0;i<limit;i++)
{
binaryHistogram[i] = 0;
}
//Only sum up for first half of the 4 billion numbers
for(i=0;i<halfLimit;i++)
{
for(j=0;j<32;j++)
{
binaryHistogram[j] += ((*list4BILLION) >> j);
}
}
//Check each ith digit to see if all halfLimit values have been parsed
for(i=halfLimit;i<limit;i++)
{
for(j=0;j<32;j++)
{
done = 1; //Dont need to continue to the end if placesChecked are all
if(placesChecked[j] != 0) //Dont need to pass through the whole list
{
done = 0; //
binaryHistogram[j] += ((*list4BILLION) >> j);
if((binaryHistogram[j] > halfLimit)||(i - binaryHistogram[j] == halfLimit))
{
answer += (1 << j);
placesChecked[j] = 1;
}
}
}
}

haskell, counting how many prime numbers are there in a list

i m a newbie to haskell, currently i need a function 'f' which, given two integers, returns the number of prime numbers in between them (i.e., greater than the first integer but smaller than the second).
Main> f 2 4
1
Main> f 2 10
3
here is my code so far, but it dosent work. any suggestions? thanks..
f :: Int -> Int -> Int
f x y
| x < y = length [ n | n <- [x..y], y 'mod' n == 0]
| otherwise = 0
Judging from your example, you want the number of primes in the open interval (x,y), which in Haskell is denoted [x+1 .. y-1].
Your primality testing is flawed; you're testing for factors of y.
To use a function name as an infix operator, use backticks (`), not single quotes (').
Try this instead:
-- note: no need for the otherwise, since [x..y] == [] if x>y
nPrimes a b = length $ filter isPrime [a+1 .. b-1]
Exercise for the reader: implement isPrime. Note that it only takes one argument.
Look at what your list comprehension does.
n <- [x..y]
Draw n from a list ranging from x to y.
y `mod` n == 0
Only select those n which evenly divide y.
length (...)
Find how many such n there are.
What your code currently does is find out how many of the numbers between x and y (inclusive) are factors of y. So if you do f 2 4, the list will be [2, 4] (the numbers that evenly divide 4), and the length of that is 2. If you do f 2 10, the list will be `[2, 5, 10] (the numbers that evenly divide 10), and the length of that is 3.
It is important to try to understand for yourself why your code doesn't work. In this case, it's simply the wrong algorithm. For algorithms that find whether a number is prime, among many other sources, you can check the wikipedia article: Primality test.
I you want to work with large intervals, then it might be a better idea to compute a list of primes once (instead of doing a isPrime test for every number):
primes = -- A list with all prime numbers
candidates = [a+1 .. b-1]
myprimes = intersectSortedLists candidates primes
nPrimes = length $ myprimes