Using Dynamic Programming To Group Numbers - grouping

Let's say you have a group of numbers. You need to eliminate numbers until there is only one left. This is hard to explain, so let me provide you with an example.
The numbers are 3, 6, 9, and 10.
You pair 3 with 9. You eliminate 3. (Note: either one of them could be eliminated). Now there are 6, 9, and 10 left. You pair 6 with 9. You eliminate 9. Now there are 6 and 10 left. You pair 6 with 10 (only option).
The problem is: I want to find the maximum value obtained from this elimination. Each time a number is eliminated, the XOR value of those two numbers is added to the count. In the previous example, the total value would be (3 ^ 6) + (6 ^ 9) + (6 ^ 10) = 10 + 15 + 12 = 37. This happens to be the maximum value that can be obtained from any elimination combination.
How would I solve this problem in Java with 2000 numbers? I know I can find every possible combination using brute force, but the run time of this was more than two seconds, and I prefer my solutions to be under two seconds. The only option left is Dynamic Programming.
Does anyone know how to solve this with Dynamic Programming?

Related

Find how many numbers meet the constraints in a range

Given 2 integers l and r, calculate how many numbers in [l, r] that meet these constraints
1) The number should be divisible by 7
2) The number contains at least three digit 7
3) The number contains more digit 7 than digit 4
777, 774746 meet those constraints, 7771, 77, 747474 are not.
Using brute force can easily find the answer but when the range is very large then it might take a lot of time.
I think dynamic programming can help to solve this problem, but i can't think of the solution
Can someone give me some guide?
Taking from the original brute-force version:
Iterate with i over numbers between [l,r]
Use modulo to check if i is divisible by 7
Use modulo and division to get counts of digits in i
digit_count(7) >= 3
digit_count(7) > digit_count(4)
here's some ideas I came up with...
1. Use only multiples of 7, implicitly fulfilling the first criterion:
This one is really simple. We can improve this to only use i that is divisible by 7. If I give you a number x and ask you to generate me numbers divisible by n until you reach y, then you'd best do:
for (auto i = x + x % n; i < y; i += n)
So for the case of multiples of 7 between l and r, all you need to do is run the loop for (auto i = l + l % 7; i < r; i += 7) This will give you 7x speed-up from the brute-force version.
2. Remember the digit counts
There's no need to execute numerous divisions and modulos to get you the count of digits of each number you go through. Since you know by how much you increment, you also know what digits change to what. This way, you'd only need to split into digits the starting number (e.g. l % 7 + l).
Now what we'd be storing isn't the count of digits but actually something very much resembling BCDs - an array of digits that represent the number we are currently working with. You'd then get something like std::vector<int> expressing the array of [7, 7, 2, 4, 5, 7] for number 772457. Now all you need to do is to use the BCD arithmetic inside the array every time you increment the loop counter, going [7, 7, 2, 4, 5, 7] + 6 = [7, 7, 2, 4, 6, 3].
The other thing we'd need to store are two ints - sevens and fours. In the intitialization phase, once you "disintegrate" the first number into the array, you'd just go through it and increment sevens for each 7, fours for each 4. And you'd just keep this numbers up to date: with each update of the array, you'd decrement fours for each 4 you took away and increment it for each 4 you created in the array. And the same for number 7. You can then compare sevens >= 3 && sevens > fours and know the result fast.
Funny thing is, that this gives you no theoretical improvement in the complexity and it might not work, but I suspect it should... It's quite a lot of code so I'm not going to provide it. You might end up working with the BCD array inverted or starting with the r end of iteration range so you don't need to resize the array. And maybe you can come up with many more improvements and tweaks. However, I have strong doubt that the solution can be made asymptotically less complex this way.
3. More thoughts
Now this wasn't dynamic programming at all. Or was it? If you think about it, I have a gut feeling that this idea of an array of numbers as BCD can now be converted to a problem where you look for permutations containing a given combination. You can make a graph out of it and search it. And that's where you'd go dynamic. I'm afraid, however, that this would make for quite a longer post...
But I already got the first doubt about that and that's the check for divisibility by 7 which would then be applied to all the numbers that are found in the graph (the graph would only support criterions 2 and 3 by its nature and yield all numbers containing the combinations). So in the end, it boils down to sizes of ranges that should be supported by the alrgorithm and the ratio of numbers fulfilling the first criterion and the numbers fulfilling the second and third ones in those ranges.
EDIT:
I have since found that my idea of computing the count of numbers fitting the criteria is incorrect. Some small comparisons table:
| range | numbers f/ c2 | c2_groups | c2_total | c1_total |
| 0 - 1k | 777 | 1 | 1 | ~143 |
| 1k - 10k | _777, 7_77, 77_7, 777_ | 4 | 40 | ~1286 |
| 10k - 100k | __777, _7_77, ... | 10 | 1000 | ~12857 |
Where numbers f/ c2 are numbers fulfilling criterion 2, c2_groups is count of possible combinations of any digit and 7s in the number, cx_total is total count of numbers fulfilling criterion x in the range.
Having that, it looks like it's quite questionable whether it would be efficient to filter by the number of digits criteria first. I suppose that would require some mathematical analysis that would take longer than implement the solution...
Space search
With having state equivalent to method #2, it is possible to do DFS in the numbers range. Instead of incrementing by 7, it would store a digits vector and increment values in it based on an offset that would be movable, e.g.
increment [1, 0, 7, _] -> [1, 0, 8, _]
^ ^
This is what the algorithm will be doing in the core loop. You can then check whether the current digits vector setup can fulfill the criteria - e.g. [0, p, _, _] can fulfill them, while [0, 0, p, _] cannot (p is the element that is being pointed to). This way, you will keep incrementing the highest possible digit, skipping a lot of numbers. Every time there is a possibility to fulfill the requirements, you will increment the offset and repeat the process:
push [7, 7, _, _] -> [7, 7, 0, _]
^ ^
Once you're at the least significant digit position, you'll also start checking the divisibility by 7 of each candidate. You can try either converting the digits to int and using modulo or using some sort of divisibility algorithm (these use digits so that's a pleasant coincidence).
This way, you'll get a number that passes all criteria and return it. Now you might come to a situation where you exhaust all the digits in given digit range. In that case, you need to move the offset one place back:
pop [7, 7, 7, 9] -> [7, 7, 7, _]
^ ^
Now, you'd use increment, see that [7, 7, 8, _] can fulfill the criteria and push again. Then run through 0, 1, 2, ... sequence until you come to 7, see that 7787 is ok with both 2nd and 3rd criteria but fails division by 7. And so on...
You'll also need to check whether you're not already over the r limit. I guess that can be done in quite a sane manner by splitting r to digits as well and comparing it from the most significant digit.
Given that we have no math analysis for this, and that this is still going through quite a lot of numbers (especially in case that 7 is the least significant digit), I wonder whether this is really worth implementing. But it's not something super-complex either. Good luck!
For 1: if(yourint % 7 == 0)
For 2: check this link split int into digits, check if digit equals 7 and count to 3.
For 3: expend link 2 with an if a digit equals 7 or 4 than counter++
At the end you should check your counters (7 an 4) which one is the highest.

Efficient algorithm for finding max number of pairs [duplicate]

This question already has answers here:
Choosing mutually exclusive pairs efficiently
(4 answers)
Closed 7 years ago.
What would be the fastest way to find pairs of numbers from a list of pair of numbers such that maximum number of pairs are formed?
For e.g: I have 6 numbers: 0, 1, 2, 3, 4, 5
Following are the valid pairs:
0 1
0 2
0 3
1 4
3 5
Now, once a number is included in a pair, the number cannot be included in another pair.
That is, if I chose the pair 0 1, I cannot again chose 0 2 as I have already used 0 once.
I need to choose pairs from the list of valid pairs such that I get maximum number of pairs.
As per the example:
If I choose the following pairs:
0 1
3 5
Note that I'll be able to chose only these two pairs such that no number is repeated and 2 and 4 will be left.
But If I choose the following pairs:
0 2
1 4
3 5
I get three pairs and no number is left alone. Similarly from a given list, I need to calculate the maximum number of pairs I can make. What would be the most efficient way to do it?
This problem can be solved in polynomial time complexity using Bloossom algorithm:
http://en.wikipedia.org/wiki/Blossom_algorithm
Form a graph where each number is node and connect each pair with edge. Run the above mentioned algorithm on this graph to find solution.
So your valid pairs could be represented as a graph, and then the maximum number of pairs is a maximum matching in that graph.
Note that you can have multiple solutions. For valid pairs [(0,1),(1,2),(2,3),(3,4)] both [(0, 1), (2, 3)] and [(1, 2), (3, 4)] are solutions.

Generating permutations which are not mirrors of each other

I want to generate permutations of n numbers, where there is no two permutations which are reversions of each other (the first one read from the last character to the first is the same as the second one). For instance, n = 3, I want to generate:
1 2 3 //but not 3 2 1
1 3 2 //but not 2 3 1
2 1 3 //but not 3 1 2
I do not care which one of the two will be generated. The algorith should be applicable for large n (>20). Is there any such algorithm or a way to check if the generated permutations is a mirror of previously generated one?
Use std::next_permutation and ignore permutations whose first element is larger than its last.
No, By usual hardware and software upto this days, you cannot do this, because the number of such a permutations is 20!/2 > 10^10 * 2^20, means you need many years to generate them.

C++ two dimensional array of bitsets

I have an assignment where we're tackling the traveling salesman problem.
I'm not going to lie, the part I'm doing right now I actually don't understand fully that they're asking, so sorry if I phrase this question weirdly.
I sort of get it, but not fully.
We're calculating an approximate distance for the salesman. We need to create a two-dimensional array, of bitsets I believe? Storing the values in binary anyway.
0 represents that the city hasn't been visited, and 1 represents that is has been visited.
We've been given an algorithm that helps significantly, and I should be able to finish it if anyone here can help with the first step:
Create memoisation table [N][(1 << N)]
(where N = number of cities).
I get that 1 << N means convert the number of cities (e.g. 5) to binary, then move the set to the left by one place.
My main issues are:
Converting N to binary (I think this is what I need to do?)
Moving the set to the left by one
Actually creating the 2-dimensional array of these sizes...
I could be wrong here, in fact that's probably pretty likely... any help is appreciated, thanks!
Here is the general rule "<<" operator means left shift and ">>" means right shift. Right shifting any number by 1 is equivalent to divide by 2 and left shift any numbers by 2 is equivalent to multiply by 2. For example lets say a number 7 (Binary 111). So 7 << 1 will become 1110 which is 7 * 2 = 14 and 7 >> 1 will become 11 which is 7 / 2 = 3 .
So for algorithm to convert a number N to a bitset array as binary is
N mod 2 (take the remainder if you divide N by 2)
Store the remainder in a collection (i.e, List, Array, Stack )
Divide N by 2
If N/2 >1 Repeat from step 1 with N/2
Else reverse the array and you have your bitset.
Moving the set left to one, If you meant leftshift by one you can do it by N<<1
This is how you create 2 dimensional array in C++
[Variable Type] TwoDimensionalArray[size][size];
For this problem though I believe you might want to read about C++ bitset and you can easily implement it using bitset. For that you just have to figure out the size of the bitset you want to use. For example if the highest value of N is 15 then you need a bitset size of 4. Because with 4 bit the maximum number you can represent is 15 (Binary 1111). Hope this helps.

Compression of sorted data with small difference

I have sorted data sequence of integers. Maximal difference between 2 numbers is 3. So data looks for example like this:
Data: 1 2 3 5 7 8 9 10 13 14
Differences: (start 1) 1 1 2 2 1 1 1 3 1
Is there a better way to store (compress) this type of sequences, than save difference values? Because if I use dictionary based methods, It failed to compress, because of randomness of numbers 1,2 and 3. If I use "PAQ" style compression, result are better, but still not quite satisfying. Huffman and Arithmetic coder is worse than dictionary based methods.
Is there some way with prediction?
For example to use regression for original data and than store differences (which could be smaller or more consistent)
Or use some kind of prediction based on histogram of differences?
Or something totally different.... or its not possible at all (which is, in my oppinion, the real answer :))
Since you say in the comments that you're already storing four differences per byte, you're likely to not do much better. If the differences 0, 1, 2, and 3 were random and evenly distributed, then there would be no way to do better.
If they are not evenly distributed, then you might be able to do better with a Huffman or arithmetic code. E.g. if 1 is more common than 0, which is more common than 2 and 3, then you could store 1 as 0, 0 as 10, 2 as 110, and 3 as 111. Or if 0 never happens, 1 as 0, 2 and 3 as 10 and 11. You could do better with an arithmetic code for the case you quote where 1 occurs 80% of the time. Or a poor man's arithmetic code by coding pairs of symbols. E.g.:
11 0
13 100
21 101
12 110
31 1110
22 111100
23 111101
32 111110
33 111111
would be a good code for 1 80%, 2 10%, 3 10%. (That doesn't quite handle the case of an odd number of differences, but you could deal with that with just a bit at the start indicating an even or odd number, and a few more bits at the end if odd.)
There might be a better predictor than the previous value. This would be a function of n previous values instead of just one previous value. However this would be highly data dependent. For example you could assume that the current value is likely to fall on the line made by the previous two values. Or that it falls on the parabola made by the previous three values. Or some other function, e.g. a sinusoid with some frequency, if the data is so biased.