Does this problem have overlapping subproblems? - c++

I am trying to solve this question on LeetCode.com:
You are given an m x n integer matrix mat and an integer target. Choose one integer from each row in the matrix such that the absolute difference between target and the sum of the chosen elements is minimized. Return the minimum absolute difference. (The absolute difference between two numbers a and b is the absolute value of a - b.)
So for input mat = [[1,2,3],[4,5,6],[7,8,9]], target = 13, the output should be 0 (since 1+5+7=13).
The solution I am referring is as below:
int dp[71][70 * 70 + 1] = {[0 ... 70][0 ... 70 * 70] = INT_MAX};
int dfs(vector<set<int>>& m, int i, int sum, int target) {
if (i >= m.size())
return abs(sum - target);
if (dp[i][sum] == INT_MAX) {
for (auto it = begin(m[i]); it != end(m[i]); ++it) {
dp[i][sum] = min(dp[i][sum], dfs(m, i + 1, sum + *it, target));
if (dp[i][sum] == 0 || sum + *it > target)
break;
}
} else {
// cout<<"Encountered a previous value!\n";
}
return dp[i][sum];
}
int minimizeTheDifference(vector<vector<int>>& mat, int target) {
vector<set<int>> m;
for (auto &row : mat)
m.push_back(set<int>(begin(row), end(row)));
return dfs(m, 0, 0, target);
}
I don't follow how this problem is solvable by dynamic programming. The states apparently are the row i and the sum (from row 0 to row i-1). Given that the problem constraints are:
m == mat.length
n == mat[i].length
1 <= m, n <= 70
1 <= mat[i][j] <= 70
1 <= target <= 800
My understanding is that we would never encounter a sum that we have previously encountered (all values are positive). Even the debug cout statement that I added does not print anything on the sample inputs given in the problem.
How could dynamic programming be applicable here?

This problem is NP-hard, since the 0-1 knapsack problem reduces to it pretty easily.
This problem also has a dynamic programming solution that is similar to the one for 0-1 knapsack:
Find all the sums you can make with a number from the first row (that's just the numbers in the first row):
For each subsequent row, add all the numbers from the ith row to all the previously accessible sums to find the sums you can get after i rows.
If you need to be able to recreate a path through the matrix, then for each sum at each level, remember the preceding one from the previous level.
There are indeed overlapping subproblems, because there will usually be multiple ways to get a lot of the sums, and you only have to remember and continue from one of them.
Here is your example:
sums from row 1: 1, 2, 3
sums from rows 1-2: 5, 6, 7, 8, 9
sums from rows 1-3: 12, 13, 14, 15, 16, 17, 18
As you see, we can make the target sum. There are a few ways:
7+4+2, 7+5+1, 8+4+1
Some targets like 15 have a lot more ways. As the size of the matrix increases, the amount of overlap tends to increase, and so this solutions is reasonably efficient in many cases. The total complexity is in O(M * N * max_weight).
But, this is an NP-hard problem, so this is not always tractable -- max_weight can grow exponentially with the size of the problem.

Related

C++: What are some general ways to make code more efficient for use with large numbers?

Please when answering this question try to be as general as possible to help the wider community, rather than just specifically helping my issue (although helping my issue would be great too ;) )
I seem to be encountering this problem time and time again with the simple problems on Project Euler. Most commonly are the problems that require a computation of the prime numbers - these without fail always fail to terminate for numbers greater than about 60,000.
My most recent issue is with Problem 12:
The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
Let us list the factors of the first seven triangle numbers:
1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28
We can see that 28 is the first triangle number to have over five divisors.
What is the value of the first triangle number to have over five hundred divisors?
Here is my code:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
int main() {
int numberOfDivisors = 500;
//I begin by looping from 1, with 1 being the 1st triangular number, 2 being the second, and so on.
for (long long int i = 1;; i++) {
long long int triangularNumber = (pow(i, 2) + i)/2
//Once I have the i-th triangular, I loop from 1 to itself, and add 1 to count each time I encounter a divisor, giving the total number of divisors for each triangular.
int count = 0;
for (long long int j = 1; j <= triangularNumber; j++) {
if (triangularNumber%j == 0) {
count++;
}
}
//If the number of divisors is 500, print out the triangular and break the code.
if (count == numberOfDivisors) {
cout << triangularNumber << endl;
break;
}
}
}
This code gives the correct answers for smaller numbers, and then either fails to terminate or takes an age to do so!
So firstly, what can I do with this specific problem to make my code more efficient?
Secondly, what are some general tips both for myself and other new C++ users for making code more efficient? (I.e. applying what we learn here in the future.)
Thanks!
The key problem is that your end condition is bad. You are supposed to stop when count > 500, but you look for an exact match of count == 500, therefore you are likely to blow past the correct answer without detecting it, and keep going ... maybe forever.
If you fix that, you can post it to code review. They might say something like this:
Break it down into separate functions for finding the next triangle number, and counting the factors of some number.
When you find the next triangle number, you execute pow. I perform a single addition.
For counting the number of factors in a number, a google search might help. (e.g. http://www.cut-the-knot.org/blue/NumberOfFactors.shtml ) You can build a list of prime numbers as you go, and use that to quickly find a prime factorization, from which you can compute the number of factors without actually counting them. When the numbers get big, that loop gets big.
Tldr: 76576500.
About your Euler problem, some math:
Preliminary 1:
Let's call the n-th triangle number T(n).
T(n) = 1 + 2 + 3 + ... + n = (n^2 + n)/2 (sometimes attributed to Gauss, sometimes someone else). It's not hard to figure it out:
1+2+3+4+5+6+7+8+9+10 =
(1+10) + (2+9) + (3+8) + (4+7) + (5+6) =
11 + 11 + 11 + 11 + 11 =
55 =
110 / 2 =
(10*10 + 10)/2
Because of its definition, it's trivial that T(n) + n + 1 = T(n+1), and that with a<b, T(a)<T(b) is true too.
Preliminary 2:
Let's call the divisor count D. D(1)=1, D(4)=3 (because 1 2 4).
For a n with c non-repeating prime factors (not just any divisors, but prime factors, eg. n = 42 = 2 * 3 * 7 has c = 3), D(n) is c^2: For each factor, there are two possibilites (use it or not). The 9 possibile divisors for the examples are: 1, 2, 3, 7, 6 (2*3), 14 (2*7), 21 (3*7), 42 (2*3*7).
More generally with repeating, the solution for D(n) is multiplying (Power+1) together. Example 126 = 2^1 * 3^2 * 7^1: Because it has two 3, the question is no "use 3 or not", but "use it 1 time, 2 times or not" (if one time, the "first" or "second" 3 doesn't change the result). With the powers 1 2 1, D(126) is 2*3*2=12.
Preliminary 3:
A number n and n+1 can't have any common prime factor x other than 1 (technically, 1 isn't a prime, but whatever). Because if both n/x and (n+1)/x are natural numbers, (n+1)/x - n/x has to be too, but that is 1/x.
Back to Gauss: If we know the prime factors for a certain n and n+1 (needed to calculate D(n) and D(n+1)), calculating D(T(n)) is easy. T(N) = (n^2 + n) / 2 = n * (n+1) / 2. As n and n+1 don't have common prime factors, just throwing together all factors and removing one 2 because of the "/2" is enough. Example: n is 7, factors 7 = 7^1, and n+1 = 8 = 2^3. Together it's 2^3 * 7^1, removing one 2 is 2^2 * 7^1. Powers are 2 1, D(T(7)) = 3*2 = 6. To check, T(7) = 28 = 2^2 * 7^1, the 6 possible divisors are 1 2 4 7 14 28.
What the program could do now: Loop through all n from 1 to something, always factorize n and n+1, use this to get the divisor count of the n-th triangle number, and check if it is >500.
There's just the tiny problem that there are no efficient algorithms for prime factorization. But for somewhat small numbers, todays computers are still fast enough, and keeping all found factorizations from 1 to n helps too for finding the next one (for n+1). Potential problem 2 are too large numbers for longlong, but again, this is no problem here (as can be found out with trying).
With the described process and the program below, I got
the 12375th triangle number is 76576500 and has 576 divisors
#include <iostream>
#include <vector>
#include <cstdint>
using namespace std;
const int limit = 500;
vector<uint64_t> knownPrimes; //2 3 5 7...
//eg. [14] is 1 0 0 1 ... because 14 = 2^1 * 3^0 * 5^0 * 7^1
vector<vector<uint32_t>> knownFactorizations;
void init()
{
knownPrimes.push_back(2);
knownFactorizations.push_back(vector<uint32_t>(1, 0)); //factors for 0 (dummy)
knownFactorizations.push_back(vector<uint32_t>(1, 0)); //factors for 1 (dummy)
knownFactorizations.push_back(vector<uint32_t>(1, 1)); //factors for 2
}
void addAnotherFactorization()
{
uint64_t number = knownFactorizations.size();
size_t len = knownPrimes.size();
for(size_t i = 0; i < len; i++)
{
if(!(number % knownPrimes[i]))
{
//dividing with a prime gets a already factorized number
knownFactorizations.push_back(knownFactorizations[number / knownPrimes[i]]);
knownFactorizations[number][i]++;
return;
}
}
//if this failed, number is a newly found prime
//because a) it has no known prime factors, so it must have others
//and b) if it is not a prime itself, then it's factors should've been
//found already (because they are smaller than the number itself)
knownPrimes.push_back(number);
len = knownFactorizations.size();
for(size_t s = 0; s < len; s++)
{
knownFactorizations[s].push_back(0);
}
knownFactorizations.push_back(knownFactorizations[0]);
knownFactorizations[number][knownPrimes.size() - 1]++;
}
uint64_t calculateDivisorCountOfN(uint64_t number)
{
//factors for number must be known
uint64_t res = 1;
size_t len = knownFactorizations[number].size();
for(size_t s = 0; s < len; s++)
{
if(knownFactorizations[number][s])
{
res *= (knownFactorizations[number][s] + 1);
}
}
return res;
}
uint64_t calculateDivisorCountOfTN(uint64_t number)
{
//factors for number and number+1 must be known
uint64_t res = 1;
size_t len = knownFactorizations[number].size();
vector<uint32_t> tmp(len, 0);
size_t s;
for(s = 0; s < len; s++)
{
tmp[s] = knownFactorizations[number][s]
+ knownFactorizations[number+1][s];
}
//remove /2
tmp[0]--;
for(s = 0; s < len; s++)
{
if(tmp[s])
{
res *= (tmp[s] + 1);
}
}
return res;
}
int main()
{
init();
uint64_t number = knownFactorizations.size() - 2;
uint64_t DTn = 0;
while(DTn <= limit)
{
number++;
addAnotherFactorization();
DTn = calculateDivisorCountOfTN(number);
}
uint64_t tn;
if(number % 2) tn = ((number+1)/2)*number;
else tn = (number/2)*(number+1);
cout << "the " << number << "th triangle number is "
<< tn << " and has " << DTn << " divisors" << endl;
return 0;
}
About your general question about speed:
1) Algorithms.
How to know them? For (relatively) simple problems, either reading a book/Wikipedia/etc. or figuring it out if you can. For harder stuff, learning more basic things and gaining experience is necessary before it's even possible to understand them, eg. studying CS and/or maths ... number theory helps a lot for your Euler problem. (It will help less to understand how a MP3 file is compressed ... there are many areas, it's not possible to know everything.).
2a) Automated compiler optimizations of frequently used code parts / patterns
2b) Manual timing what program parts are the slowest, and (when not replacing it with another algorithm) changing it in a way that eg. requires less data send to slow devices (HDD, hetwork...), less RAM memory access, less CPU cycles, works better together with OS scheduler and memory management strategies, uses the CPU pipeline/caches better etc.etc. ... this is both education and experience (and a big topic).
And because long variables have a limited size, sometimes it is necessary to use custom types that use eg. a byte array to store a single digit in each byte. That way, it's possible to use the whole RAM for a single number if you want to, but the downside is you/someone has to reimplement stuff like addition and so on for this kind of number storage. (Of course, libs for that exist already, without writing everything from scratch).
Btw., pow is a floating point function and may get you inaccurate results. It's not appropriate to use it in this case.

Is there any number repeated in the array?

There's array of size n. The values can be between 0 and (n-1) as the indices.
For example: array[4] = {0, 2, 1, 3}
I should say if there's any number that is repeated more than 1 time.
For example: array[5] = {3,4,1,2,4} -> return true because 4 is repeated.
This question has so many different solutions and I would like to know if this specific solution is alright (if yes, please prove, else refute).
My solution (let's look at the next example):
array: indices 0 1 2 3 4
values 3 4 1 2 0
So I suggest:
count the sum of the indices (4x5 / 2 = 10) and check that the values' sum (3+4+1+2+0) is equal to this sum. if not, there's repeated number.
in addition to the first condition, get the multiplication of the indices(except 0. so: 1x2x3x4) and check if it's equal to the values' multiplication (except 0, so: 3x4x1x2x0).
=> if in each condition, it's equal then I say that there is NO repeated number. otherwise, there IS a repeated number.
Is it correct? if yes, please prove it or show me a link. else, please refute it.
Why your algorithm is wrong?
Your solution is wrong, here is a counter example (there may be simpler ones, but I found this one quite quickly):
int arr[13] = {1, 1, 2, 3, 4, 10, 6, 7, 8, 9, 10, 11, 6};
The sum is 78, and the product is 479001600, if you take the normal array of size 13:
int arr[13] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
It also has a sum of 78 and a product of 479001600 so your algorithm does not work.
How to find counter examples?1
To find a counter example2 3:
Take an array from 0 to N - 1;
Pick two even numbers3 M1 > 2 and M2 > 2 between 0 and N - 1 and halve them;
Replace P1 = M1/2 - 1 by 2 * P1 and P2 = M2/2 + 1 by 2 * P2.
In the original array you have:
Product = M1 * P1 * M2 * P2
Sum = 0 + M1 + P1 + M2 + P2
= M1 + M1/2 - 1 + M2 + M2/2 + 1
= 3/2 * (M1 + M2)
In the new array you have:
Product = M1/2 * 2 * P1 + M2/2 * 2 * P2
= M1 * P1 * M2 * P2
Sum = M1/2 + 2P1 + M2/2 + 2P2
= M1/2 + 2(M1/2 - 1) + M2/2 + 2(M2/2 + 1)
= 3/2 * M1 - 2 + 3/2 * M2 + 2
= 3/2 * (M1 + M2)
So both array have the same sum and product, but one has repeated values, so your algorithm does not work.
1 This is one method of finding counter examples, there may be others (there are probably others).
2 This is not exactly the same method I used to find the first counter example - In the original method, I used only one number M and was using the fact that you can replace 0 by 1 without changing the product, but I propose a more general method here in order to avoid argument such as "But I can add a check for 0 in my algorithm.".
3 That method does not work with small array because you need to find 2 even numbers M1 > 2 and M2 > 2 such that M1/2 != M2 (and reciprocally) and M1/2 - 1 != M2/2 + 1, which (I think) is not possible for any array with a size lower than 14.
What algorithms do work?4
Algorithm 1: O(n) time and space complexity.
If you can allocate a new array of size N, then:
template <std::size_t N>
bool has_repetition (std::array<int, N> const& array) {
std::array<bool, N> rep = {0};
for (auto v: array) {
if (rep[v]) {
return true;
}
rep[v] = true;
}
return false;
}
Algorithm 2: O(nlog(n)) time complexity and O(1) space complexity, with a mutable array.
You can simply sort the array:
template <std::size_t N>
bool has_repetition (std::array<int, N> &array) {
std::sort(std::begin(array), std::end(array));
auto it = std::begin(array);
auto ne = std::next(it);
while (ne != std::end(array)) {
if (*ne == *it) {
return true;
}
++it; ++ne;
}
return false;
}
Algorithm 3: O(n^2) time complexity and O(1) space complexity, with non mutable array.
template <std::size_t N>
bool has_repetition (std::array<int, N> const& array) {
for (auto it = std::begin(array); it != std::end(array); ++it) {
for (auto jt = std::next(it); jt != std::end(array); ++jt) {
if (*it == *jt) {
return true;
}
}
}
return false;
}
4 These algorithms do work, but there may exist other ones that performs better - These are only the simplest ones I could think of given some "restrictions".
What's wrong with your method?
Your method computes some statistics of the data and compares them with those expected for a permutation (= correct answers). While a violation of any of these comparisons is conclusive (the data cannot satisfy the constraint), the inverse is not necessarily the case. You only look at two statistics, and these are too few for sufficiently large data sets. Owing to the fact that the data are integer, the smallest number of data for which your method may fail is larger than 3.
If you are searching duplicates in your array there is simple way:
int N =5;
int array[N] = {1,2,3,4,4};
for (int i = 0; i< N; i++){
for (int j =i+1; j<N; j++){
if(array[j]==array[i]){
std::cout<<"DUPLICATE FOUND\n";
return true;
}
}
}
return false;
Other simple way to find duplicates is using the std::set container for example:
std::set<int> set_int;
set_int.insert(5);
set_int.insert(5);
set_int.insert(4);
set_int.insert(4);
set_int.insert(5);
std::cout<<"\nsize "<<set_int.size();
the output will be 2, because there is 2 individual values
A more in depth explanation why your algorithm is wrong:
count the sum of the indices (4x5 / 2 = 10) and check that the values' sum (3+4+1+2+0) is equal to this sum. if not, there's repeated number.
Given any array A which has no duplicates, it is easy to create an array that meets your first requirement but now contains duplicates. Just take take two values and subtract one of them by some value v and add that value to the other one. Or take multiple values and make sure the sum of them stays the same. (As long as new values are still within the 0 .. N-1 range.) For N = 3 it is already possible to change {0,1,2} to {1,1,1}. For an array of size 3, there are 7 compositions that have correct sum, but 1 is a false positive. For an array of size 4 there are 20 out of 44 have duplicates, for an array of size 5 that's 261 out of 381, for an array of size 6 that's 3612 out of 4332, and so on. It is save to say that the number of false positives grows much faster than real positives.
in addition to the first condition, get the multiplication of the indices(except 0. so: 1x2x3x4) and check if it's equal to the values' multiplication (except 0, so: 3x4x1x2x0).
The second requirement involves the multiplication of all indices above 0. It is easy to realize this is could never be a very strong restriction either. As soon as one of the indices is not prime, the product of all indices is no longer uniquely tied to the multiplicands and a list can be constructed of different values with the same result. E.g. a pair of 2 and 6 can be replaced with 3 and 4, 2 and 9 can be replaced with 6 and 3 and so on. Obviously the number of false positives increases as the array-size gets larger and more non-prime values are used as multiplicands.
None of these requirements is really strong and the cannot compensate for the other. Since 0 is not even considered for the second restriction a false positive can be created fairly easy for arrays starting at size 5. any pair of 0 and 4 can simply be replaced with two 2's in any unique array, for example {2, 1, 2, 3, 2}
What you would need, is to have a result that is uniquely tight to the occurring values. You could tweak your second requirement to a more complex approach and skip over the non-prime values and take 0 into account. For example you could use the first prime as multiplicand (2) for 0, use 3 as multiplicand for 1, 5 as multiplicand for 2, and so on. That would work (you would not need the first requirement), but this approach would be overly complex. An simpler way to get a unique result would be to OR the i-th bit for each value (0 => 1 << 0, 1 => 1 << 1, 2 => 1 << 2, and so on. (Obviously it is faster to check wether a bit was already set by a reoccurring value, rather than wait for the final result. And this is conceptually the same as using a bool array/vector from the other examples!)

How to calculate the minimum cost to convert all n numbers in an array to m?

I have been given the following assignment:
Given N integers in the form of A(i) where 1≤i≤N, make each number
A(i) in the N numbers equal to M. To convert a number A(i) to M, it
will cost |M−Ai| units. Find out the minimum cost to convert all the N
numbers to M, so you should choose the best M to get the minimum cost.
Given:
1 <= N <= 10^5
1 <= A(i) <= 10^9
My approach was to calculate the sum of all numbers and find avg = sum / n and then subtract each number by avg to get the minimum cost.
But this fails in many test cases. How can I find the optimal solution for this?
You should take the median of the numbers (or either of the two numbers nearest the middle if the list has even length), not the mean.
An example where the mean fails to minimize is: [1, 2, 3, 4, 100]. The mean is 110 / 5 = 22, and the total cost is 21 + 20 + 19 + 18 + 78 = 156. Choosing the median (3) gives total cost: 2 + 1 + 0 + 1 + 97 = 101.
An example where the median lies between two items in the list is [1, 2, 3, 4, 5, 100]. Here the median is 3.5, and it's ok to either use M=3 or M=4. For M=3, the total cost is 2 + 1 + 0 + 1 + 2 + 97 = 103. For M=4, the total cost is 3 + 2 + 1 + 0 + 1 + 96 = 103.
A formal proof of correctness can be found on Mathematics SE, although you may convince yourself of the result by noting that if you nudge M a small amount delta in one direction (but not past one of the data points) -- and for example's sake let's say it's in the positive direction, the total cost increases by delta times the number of points to the left of M minus delta times the number of points to the right of M. So M is minimized when the number of points to its left and the right are equal in number, otherwise you could move it a small amount one way or the other to decrease the total cost.
#PaulHankin already provided a perfect answer. Anyway, when thinking about the problem, I didn't think of the median being the solution. But even if you don't know about the median, you can come up with a programming solution.
I made similar observations as #PaulHankin in the last paragraph of his last answer. This made me realize, that I have to eliminate outliers iteratively in order to find m. So I wrote a program that first sorts the input array (vector) A and then analyzes the minimum and maximum values.
The idea is to move the minimum values towards the second smallest values and the maximum values towards the second largest values. You always move either the minimum or maximum values, depending on whether you have less minimum values than maximum values or not. If all array items end up being the same value, then you found your m:
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int getMinCount(vector<int>& A);
int getMaxCount(vector<int>& A);
int main()
{
// Example as given by #PaulHankin
vector<int> A;
A.push_back(1);
A.push_back(2);
A.push_back(3);
A.push_back(4);
A.push_back(100);
sort(A.begin(), A.end());
int minCount = getMinCount(A);
int maxCount = getMaxCount(A);
while (minCount != A.size() && maxCount != A.size())
{
if(minCount <= maxCount)
{
for(int i = 0; i < minCount; i++)
A[i] = A[minCount];
// Recalculate the count of the minium value, because we changed the minimum.
minCount = getMinCount(A);
}
else
{
for(int i = 0; i < maxCount; i++)
A[A.size() - 1 - i] = A[A.size() - 1 - maxCount];
// Recalculate the count of the maximum value, because we changed the maximum.
maxCount = getMaxCount(A);
}
}
// Print out the one and only remaining value, which is m.
cout << A[0] << endl;
return 0;
}
int getMinCount(vector<int>& A)
{
// Count how often the minimum value exists.
int minCount = 1;
int pos = 1;
while (pos < A.size() && A[pos++] == A[0])
minCount++;
return minCount;
}
int getMaxCount(vector<int>& A)
{
// Count how often the maximum value exists.
int maxCount = 1;
int pos = A.size() - 2;
while (pos >= 0 && A[pos--] == A[A.size() - 1])
maxCount++;
return maxCount;
}
If you think about the algorithm, then you will come to the conclusion, that it actually calculates the median of the values in the array A. As example input I took the first example given by #PaulHankin. As expected, the code provides the correct result (3) for it.
I hope my approach helps you to understand how to tackle such kind of problems even if you don't know the correct solution. This is especially helpful when you are in an interview, for example.

Finding missing number using binary search

I am reading book on programming pearls.
Question: Given a sequential file that contains at most four billion
32 bit integers in random order, find a 32-bit integer that isn't in
the file (and there must be at least one missing). This problem has to
be solved if we have a few hundred bytes of main memory and several
sequential files.
Solution: To set this up as a binary search we have to define a range,
a representation for the elements within the range, and a probing
method to determine which half of a range holds the missing integer.
How do we do this?
We'll use as the range a sequence of integers known to contain atleast
one missing element, and we'll represent the range by a file
containing all the integers in it. The insight is that we can probe a
range by counting the elements above and below its midpoint: either
the upper or the lower range has atmost half elements in the total
range. Because the total range has a missing element, the smaller half
must also have a mising element. These are most ingredients of a
binary search algorithm for above problem.
Above text is copy right of Jon Bently from programming pearls book.
Some info is provided at following link
"Programming Pearls" binary search help
How do we search by passes using binary search and also not followed with the example given in above link? Please help me understand logic with just 5 integers rather than million integers to understand logic.
Why don't you re-read the answer in the post "Programming Pearls" binary search help. It explains the process on 5 integers as you ask.
The idea is that you parse each list and break it into 2 (this is where binary part comes from) separate lists based on the value in the first bit.
I.e. showing binary representation of actual numbers
Original List "": 001, 010, 110, 000, 100, 011, 101 => (broken into)
(we remove the first bit and append it to the "name" of the new list)
To form each of the bellow lists we took values starting with [0 or 1] from the list above
List "0": 01, 10, 00, 11 (is formed from subset 001, 010, 000, 011 of List "" by removing the first bit and appending it to the "name" of the new list)
List "1": 10, 00, 01 (is formed from subset 110, 100, 101 of List "" by removing the first bit and appending it to the "name" of the new list)
Now take one of the resulting lists in turn and repeat the process:
List "0" becomes your original list and you break it into
List "0***0**" and
List "0***1**" (the bold numbers are again the 1 [remaining] bit of the numbers in the list being broken)
Carry on until you end up with the empty list(s).
EDIT
Process step by step:
List "": 001, 010, 110, 000, 100, 011, 101 =>
List "0": 01, 10, 00, 11 (from subset 001, 010, 000, 011 of the List "") =>
List "00": 1, 0 (from subset 01, 00 of the List "0") =>
List "000": 0 [final result] (from subset 0 of the List "00")
List "001": 1 [final result] (from subset 1 of the List "00")
List "01": 0, 1 (from subset 10, 11 of the List "0") =>
List "010": 0 [final result] (from subset 0 of the List "01")
List "011": 1 [final result] (from subset 1 of the List "01")
List "1": 10, 00, 01 (from subset 110, 100, 101 of the List "") =>
List "10": 0, 1 (from subset 00, 01 of the List "1") =>
List "100": 0 [final result] (from subset 0 of the List "10")
List "101": 1 [final result] (from subset 1 of the List "10")
List "11": 0 (from subset 10 of the List "1") =>
List "110": 0 [final result] (from subset 0 of the List "11")
List "111": absent [final result] (from subset EMPTY of the List "11")
The positive of this method is that it will allow you to find ANY number of missing numbers in the set - i.e. if more than one is missing.
P.S. AFAIR for 1 single missing number out of the complete range there is even more elegant solution of XOR all numbers.
The idea is to solve easier problem:
Is the missing value in range [minVal, X] or (X, maxVal).
If you know this, you can move X and check again.
For example, you have 3, 4, 1, 5 (2 is missing).
You know that minVal = 1, maxVal = 5.
Range = [1, 5], X = 3, there should be 3 integers in range [1, 3] and 2 in range [4, 5]. There are only 2 in range [1, 3], so you are looking in range [1, 3]
Range = [1, 3], X = 2. There are only 1 value in range [1, 2], so you are looking in range [1, 2]
Range = [1, 2], X = 1. There are no values in range [2, 2] so it is your answer.
EDIT: Some pseudo-C++ code:
minVal = 1, maxVal = 5; //choose correct values
while(minVal < maxVal){
int X = (minVal + maxVal) / 2
int leftNumber = how much in range [minVal, X]
int rightNumber = how much in range [X + 1, maxVal]
if(leftNumber < (X - minVal + 1))maxVal = X
else minVal = X + 1
}
Here's a simple C solution which should illustrate the technique. To abstract away any tedious file I/O details, I'm assuming the existence of the following three functions:
unsigned long next_number (void) reads a number from the file and returns it. When called again, the next number in the file is returned, and so on. Behavior when the end of file is encountered is undefined.
int numbers_left (void) returns a true value if there are more numbers available to be read using next_number(), false if the end of the file has been reached.
void return_to_start (void) rewinds the reading position to the start of the file, so that the next call to next_number() returns the first number in the file.
I'm also assuming that unsigned long is at least 32 bits wide, as required for conforming ANSI C implementations; modern C programmers may prefer to use uint32_t from stdint.h instead.
Given these assumptions, here's the solution:
unsigned long count_numbers_in_range (unsigned long min, unsigned long max) {
unsigned long count = 0;
return_to_start();
while ( numbers_left() ) {
unsigned long num = next_number();
if ( num >= min && num <= max ) {
count++;
}
}
return count;
}
unsigned long find_missing_number (void) {
unsigned long min = 0, max = 0xFFFFFFFF;
while ( min < max ) {
unsigned long midpoint = min + (max - min) / 2;
unsigned long count = count_numbers_in_range( min, midpoint );
if ( count < midpoint - min + 1 ) {
max = midpoint; // at least one missing number below midpoint
} else {
min = midpoint; // no missing numbers below midpoint, must be above
}
}
return min;
}
One detail to note is that min + (max - min) / 2 is the safe way to calculate the average of min and max; it won't produce bogus results due to overflowing intermediate values like the seemingly simpler (min + max) / 2 might.
Also, even though it would be tempting to solve this problem using recursion, I chose an iterative solution instead for two reasons: first, because it (arguably) shows more clearly what's actually being done, and second, because the task was to minimize memory use, which presumably includes the stack too.
Finally, it would be easy to optimize this code, e.g. by returning as soon as count equals zero, by counting the numbers in both halves of the range in one pass and choosing the one with more missing numbers, or even by extending the binary search to n-ary search for some n > 2 to reduce the number of passes. However, to keep the example code as simple as possible, I've left such optimizations unmade. If you like, you may want to, say, try modifying the code so that it requires at most eight passes over the file instead of the current 32. (Hint: use a 16-element array.)
Actually, if we have range of integers from a to b. Sample: [a..b].
And in this range we have b-a integers. It means, that only one is missing.
And if only one is missing, we can calculate result using only single cycle.
First we can calculate sum of all integers in range [a..b], which equals:
sum = (a + b) * (b - a + 1) / 2
Then we calcualate summ of all integers in our sequence:
long sum1 = 0;
for (int i = 0; i < b - a; i++)
sum1 += arr[i];
Then we can find missing element as difference of those two sums:
long result = sum1 - sum;
when you've seen 2^31 zeros or ones in the ith digit place then your answer has a one or zero in the ith place. (Ex: 2^31 ones in 5th binary position means the answer has a zero in the 5th binary position.
First draft of c code:
uint32_t binaryHistogram[32], *list4BILLION, answer, placesChecked[32];
uint64_t limit = 4294967296;
uint32_t halfLimit = 4294967296/2;
int i, j, done
//General method to point to list since this detail is not important to the question.
list4BILLION = 0000000000h;
//Initialize array to zero. This array represents the number of 1s seen as you parse through the list
for(i=0;i<limit;i++)
{
binaryHistogram[i] = 0;
}
//Only sum up for first half of the 4 billion numbers
for(i=0;i<halfLimit;i++)
{
for(j=0;j<32;j++)
{
binaryHistogram[j] += ((*list4BILLION) >> j);
}
}
//Check each ith digit to see if all halfLimit values have been parsed
for(i=halfLimit;i<limit;i++)
{
for(j=0;j<32;j++)
{
done = 1; //Dont need to continue to the end if placesChecked are all
if(placesChecked[j] != 0) //Dont need to pass through the whole list
{
done = 0; //
binaryHistogram[j] += ((*list4BILLION) >> j);
if((binaryHistogram[j] > halfLimit)||(i - binaryHistogram[j] == halfLimit))
{
answer += (1 << j);
placesChecked[j] = 1;
}
}
}
}

Algorithm to determine coin combinations

I was recently faced with a prompt for a programming algorithm that I had no idea what to do for. I've never really written an algorithm before, so I'm kind of a newb at this.
The problem said to write a program to determine all of the possible coin combinations for a cashier to give back as change based on coin values and number of coins. For example, there could be a currency with 4 coins: a 2 cent, 6 cent, 10 cent and 15 cent coins. How many combinations of this that equal 50 cents are there?
The language I'm using is C++, although that doesn't really matter too much.
edit: This is a more specific programming question, but how would I analyze a string in C++ to get the coin values? They were given in a text document like
4 2 6 10 15 50
(where the numbers in this case correspond to the example I gave)
This problem is well known as coin change problem. Please check this and this for details. Also if you Google "coin change" or "dynamic programming coin change" then you will get many other useful resources.
Here's a recursive solution in Java:
// Usage: int[] denoms = new int[] { 1, 2, 5, 10, 20, 50, 100, 200 };
// System.out.println(ways(denoms, denoms.length, 200));
public static int ways(int denoms[], int index, int capacity) {
if (capacity == 0) return 1;
if (capacity < 0 || index <= 0 ) return 0;
int withoutItem = ways(denoms, index - 1, capacity);
int withItem = ways(denoms, index, capacity - denoms[index - 1]);
return withoutItem + withItem;
}
This seems somewhat like a Partition, except that you don't use all integers in 1:50. It also seems similar to bin packing problem with slight differences:
Wikipedia: Partition (Number Theory)
Wikipedia: Bin packing problem
Wolfram Mathworld: Partiton
Actually, after thinking about it, it's an ILP, and thus NP-hard.
I'd suggest some dynamic programming appyroach. Basically, you'd define a value "remainder" and set it to whatever your goal was (say, 50). Then, at every step, you'd do the following:
Figure out what the largest coin that can fit within the remainder
Consider what would happen if you (A) included that coin or (B) did not include that coin.
For each scenario, recurse.
So if remainder was 50 and the largest coins were worth 25 and 10, you'd branch into two scenarios:
1. Remainder = 25, Coinset = 1x25
2. Remainder = 50, Coinset = 0x25
The next step (for each branch) might look like:
1-1. Remainder = 0, Coinset = 2x25 <-- Note: Remainder=0 => Logged
1-2. Remainder = 25, Coinset = 1x25
2-1. Remainder = 40, Coinset = 0x25, 1x10
2-2. Remainder = 50, Coinset = 0x25, 0x10
Each branch would split into two branches unless:
the remainder was 0 (in which case you would log it)
the remainder was less than the smallest coin (in which case you would discard it)
there were no more coins left (in which case you would discard it since remainder != 0)
If you have 15, 10, 6 and 2 cents coins and you need to find how many distinct ways are there to arrive to 50 you can
count how many distinct ways you have to reach 50 using only 10, 6 and 2
count how many distinct ways you have to reach 50-15 using only 10, 6 and 2
count how many distinct ways you have to reach 50-15*2 using only 10, 6 and 2
count how many distinct ways you have to reach 50-15*3 using only 10, 6 and 2
Sum up all these results that are of course distinct (in the first I used no 15c coins, in the second I used one, in the third two and in the fourth three).
So you basically can split the problem in smaller problems (possibly smaller amount and fewer coins). When you have just one coin type the answer is of course trivial (either you cannot reach the prescribed amount exactly or you can in the only possible way).
Moreover you can also avoid repeating the same computation by using memoization, for example the number of ways of reach 20 using only [6, 2] doesn't depend if the already paid 30 have been reached using 15+15 or 10+10+10, so the result of the smaller problem (20, [6, 2]) can
be stored and reused.
In Python the implementation of this idea is the following
cache = {}
def howmany(amount, coins):
prob = tuple([amount] + coins) # Problem signature
if prob in cache:
return cache[prob] # We computed this before
if amount == 0:
return 1 # It's always possible to give an exact change of 0 cents
if len(coins) == 1:
if amount % coins[0] == 0:
return 1 # We can match prescribed amount with this coin
else:
return 0 # It's impossible
total = 0
n = 0
while n * coins[0] <= amount:
total += howmany(amount - n * coins[0], coins[1:])
n += 1
cache[prob] = total # Store in cache to avoid repeating this computation
return total
print howmany(50, [15, 10, 6, 2])
As for the second part of your question, suppose you have that string in the file coins.txt:
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream coins_file("coins.txt");
std::vector<int> coins;
std::copy(std::istream_iterator<int>(coins_file),
std::istream_iterator<int>(),
std::back_inserter(coins));
}
Now the vector coins will contain the possible coin values.
For such a small number of coins you can write a simple brute force solution.
Something like this:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
vector<int> v;
int solve(int total, int * coins, int lastI)
{
if (total == 50)
{
for (int i = 0; i < v.size(); i++)
{
cout << v.at(i) << ' ';
}
cout << "\n";
return 1;
}
if (total > 50) return 0;
int sum = 0;
for (int i = lastI; i < 6; i++)
{
v.push_back(coins[i]);
sum += solve(total + coins[i], coins, i);
v.pop_back();
}
return sum;
}
int main()
{
int coins[6] = {2, 4, 6, 10, 15, 50};
cout << solve(0, coins, 0) << endl;
}
A very dirty brute force solution that prints all possible combinations.
This is a very famous problem, so try reading about better solutions others have provided.
One rather dumb approach is the following. You build a mapping "coin with value X is used Y times" and then enumerate all possible combinations and only select those which total the desired sum. Obviously for each value X you have to check Y ranging from 0 up to the desired sum. This will be rather slow, but will solve your task.
It's very similar to the knapsack problem
You basically have to solve the following equation: 50 = a*4 + b*6 + c*10 + d*15, where the unknowns are a,b,c,d. You can compute for instance d = (50 - a*4 - b*6 - c*10)/15 and so on for each variable. Then, you start giving d all the possible values (you should start with the one that has the least possible values, here d): 0,1,2,3,4 and than start giving c all the possible values depending on the current value of d and so on.
Sort the List backwards: [15 10 6 4 2]
Now a solution for 50 ct can contain 15 ct or not.
So the number of solutions is the number of solutions for 50 ct using [10 6 4 2] (no longer considering 15 ct coins) plus the number of solutions for 35 ct (=50ct - 15ct) using [15 10 6 4 2]. Repeat the process for both sub-problems.
An algorithm is a procedure for solving a problem, it doesn't have to be in any particular language.
First work out the inputs:
typedef int CoinValue;
set<CoinValue> coinTypes;
int value;
and the outputs:
set< map<CoinValue, int> > results;
Solve for the simplest case you can think of first:
coinTypes = { 1 }; // only one type of coin worth 1 cent
value = 51;
the result should be:
results = { [1 : 51] }; // only one solution, 51 - 1 cent coins
How would you solve the above?
How about this:
coinTypes = { 2 };
value = 51;
results = { }; // there is no solution
what about this?
coinTypes = { 1, 2 };
value = { 4 };
results = { [2: 2], [2: 1, 1: 2], [1: 4] }; // the order I put the solutions in is a hint to how to do the algorithm.
Recursive solution based on algorithmist.com resource in Scala:
def countChange(money: Int, coins: List[Int]): Int = {
if (money < 0 || coins.isEmpty) 0
else if (money == 0) 1
else countChange(money, coins.tail) + countChange(money - coins.head, coins)
}
Another Python version:
def change(coins, money):
return (
change(coins[:-1], money) +
change(coins, money - coins[-1])
if money > 0 and coins
else money == 0
)