I've been tasked with helping some accountants solve a common problem they have - given a list of transactions and a total deposit, which transactions are part of the deposit? For example, say I have this list of numbers:
1.00
2.50
3.75
8.00
And I know that my total deposit is 10.50, I can easily see that it's made up of the 8.00 and 2.50 transaction. However, given a hundred transactions and a deposit in the millions, it quickly becomes much more difficult.
In testing a brute force solution (which takes way too long to be practical), I had two questions:
With a list of about 60 numbers, it seems to find a dozen or more combinations for any total that's reasonable. I was expecting a single combination to satisfy my total, or maybe a few possibilities, but there always seem to be a ton of combinations. Is there a math principle that describes why this is? It seems that given a collection of random numbers of even a medium size, you can find a multiple combination that adds up to just about any total you want.
I built a brute force solution for the problem, but it's clearly O(n!), and quickly grows out of control. Aside from the obvious shortcuts (exclude numbers larger than the total themselves), is there a way to shorten the time to calculate this?
Details on my current (super-slow) solution:
The list of detail amounts is sorted largest to smallest, and then the following process runs recursively:
Take the next item in the list and see if adding it to your running total makes your total match the target. If it does, set aside the current chain as a match. If it falls short of your target, add it to your running total, remove it from the list of detail amounts, and then call this process again
This way it excludes the larger numbers quickly, cutting the list down to only the numbers it needs to consider. However, it's still n! and larger lists never seem to finish, so I'm interested in any shortcuts I might be able to take to speed this up - I suspect that even cutting 1 number out of the list would cut the calculation time in half.
Thanks for your help!
This special case of the Knapsack problem is called Subset Sum.
C# version
setup test:
using System;
using System.Collections.Generic;
public class Program
{
public static void Main(string[] args)
{
// subtotal list
List<double> totals = new List<double>(new double[] { 1, -1, 18, 23, 3.50, 8, 70, 99.50, 87, 22, 4, 4, 100.50, 120, 27, 101.50, 100.50 });
// get matches
List<double[]> results = Knapsack.MatchTotal(100.50, totals);
// print results
foreach (var result in results)
{
Console.WriteLine(string.Join(",", result));
}
Console.WriteLine("Done.");
Console.ReadKey();
}
}
code:
using System.Collections.Generic;
using System.Linq;
public class Knapsack
{
internal static List<double[]> MatchTotal(double theTotal, List<double> subTotals)
{
List<double[]> results = new List<double[]>();
while (subTotals.Contains(theTotal))
{
results.Add(new double[1] { theTotal });
subTotals.Remove(theTotal);
}
// if no subtotals were passed
// or all matched the Total
// return
if (subTotals.Count == 0)
return results;
subTotals.Sort();
double mostNegativeNumber = subTotals[0];
if (mostNegativeNumber > 0)
mostNegativeNumber = 0;
// if there aren't any negative values
// we can remove any values bigger than the total
if (mostNegativeNumber == 0)
subTotals.RemoveAll(d => d > theTotal);
// if there aren't any negative values
// and sum is less than the total no need to look further
if (mostNegativeNumber == 0 && subTotals.Sum() < theTotal)
return results;
// get the combinations for the remaining subTotals
// skip 1 since we already removed subTotals that match
for (int choose = 2; choose <= subTotals.Count; choose++)
{
// get combinations for each length
IEnumerable<IEnumerable<double>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);
// add combinations where the sum mathces the total to the result list
results.AddRange(from combo in combos
where combo.Sum() == theTotal
select combo.ToArray());
}
return results;
}
}
public static class Combination
{
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
{
return choose == 0 ? // if choose = 0
new[] { new T[0] } : // return empty Type array
elements.SelectMany((element, i) => // else recursively iterate over array to create combinations
elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
}
}
results:
100.5
100.5
-1,101.5
1,99.5
3.5,27,70
3.5,4,23,70
3.5,4,23,70
-1,1,3.5,27,70
1,3.5,4,22,70
1,3.5,4,22,70
1,3.5,8,18,70
-1,1,3.5,4,23,70
-1,1,3.5,4,23,70
1,3.5,4,4,18,70
-1,3.5,8,18,22,23,27
-1,3.5,4,4,18,22,23,27
Done.
If subTotals are repeated, there will appear to be duplicate results (the desired effect). In reality, you will probably want to use the subTotal Tupled with some ID, so you can relate it back to your data.
If I understand your problem correctly, you have a set of transactions, and you merely wish to know which of them could have been included in a given total. So if there are 4 possible transactions, then there are 2^4 = 16 possible sets to inspect. This problem is, for 100 possible transactions, the search space has 2^100 = 1267650600228229401496703205376 possible combinations to search over. For 1000 potential transactions in the mix, it grows to a total of
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
sets that you must test. Brute force will hardly be a viable solution on these problems.
Instead, use a solver that can handle knapsack problems. But even then, I'm not sure that you can generate a complete enumeration of all possible solutions without some variation of brute force.
There is a cheap Excel Add-in that solves this problem: SumMatch
The Excel Solver Addin as posted over on superuser.com has a great solution (if you have Excel) https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total
Its kind of like 0-1 Knapsack problem which is NP-complete and can be solved through dynamic programming in polynomial time.
http://en.wikipedia.org/wiki/Knapsack_problem
But at the end of the algorithm you also need to check that the sum is what you wanted.
Depending on your data you could first look at the cents portion of each transaction. Like in your initial example you know that 2.50 has to be part of the total because it is the only set of non-zero cent transactions which add to 50.
Not a super efficient solution but heres an implementation in coffeescript
combinations returns all possible combinations of the elements in list
combinations = (list) ->
permuations = Math.pow(2, list.length) - 1
out = []
combinations = []
while permuations
out = []
for i in [0..list.length]
y = ( 1 << i )
if( y & permuations and (y isnt permuations))
out.push(list[i])
if out.length <= list.length and out.length > 0
combinations.push(out)
permuations--
return combinations
and then find_components makes use of it to determine which numbers add up to total
find_components = (total, list) ->
# given a list that is assumed to have only unique elements
list_combinations = combinations(list)
for combination in list_combinations
sum = 0
for number in combination
sum += number
if sum is total
return combination
return []
Heres an example
list = [7.2, 3.3, 4.5, 6.0, 2, 4.1]
total = 7.2 + 2 + 4.1
console.log(find_components(total, list))
which returns [ 7.2, 2, 4.1 ]
#include <stdio.h>
#include <stdlib.h>
/* Takes at least 3 numbers as arguments.
* First number is desired sum.
* Find the subset of the rest that comes closest
* to the desired sum without going over.
*/
static long *elements;
static int nelements;
/* A linked list of some elements, not necessarily all */
/* The list represents the optimal subset for elements in the range [index..nelements-1] */
struct status {
long sum; /* sum of all the elements in the list */
struct status *next; /* points to next element in the list */
int index; /* index into elements array of this element */
};
/* find the subset of elements[startingat .. nelements-1] whose sum is closest to but does not exceed desiredsum */
struct status *reportoptimalsubset(long desiredsum, int startingat) {
struct status *sumcdr = NULL;
struct status *sumlist = NULL;
/* sum of zero elements or summing to zero */
if (startingat == nelements || desiredsum == 0) {
return NULL;
}
/* optimal sum using the current element */
/* if current elements[startingat] too big, it won't fit, don't try it */
if (elements[startingat] <= desiredsum) {
sumlist = malloc(sizeof(struct status));
sumlist->index = startingat;
sumlist->next = reportoptimalsubset(desiredsum - elements[startingat], startingat + 1);
sumlist->sum = elements[startingat] + (sumlist->next ? sumlist->next->sum : 0);
if (sumlist->sum == desiredsum)
return sumlist;
}
/* optimal sum not using current element */
sumcdr = reportoptimalsubset(desiredsum, startingat + 1);
if (!sumcdr) return sumlist;
if (!sumlist) return sumcdr;
return (sumcdr->sum < sumlist->sum) ? sumlist : sumcdr;
}
int main(int argc, char **argv) {
struct status *result = NULL;
long desiredsum = strtol(argv[1], NULL, 10);
nelements = argc - 2;
elements = malloc(sizeof(long) * nelements);
for (int i = 0; i < nelements; i++) {
elements[i] = strtol(argv[i + 2], NULL , 10);
}
result = reportoptimalsubset(desiredsum, 0);
if (result)
printf("optimal subset = %ld\n", result->sum);
while (result) {
printf("%ld + ", elements[result->index]);
result = result->next;
}
printf("\n");
}
Best to avoid use of floats and doubles when doing arithmetic and equality comparisons btw.
I have a program that displays one random number in file .
#include <iostream>
#include <fstream>
#include <random>
using namespace std;
int main() {
std::ofstream file("file.txt",std::ios_base::app);
int var = rand() % 100 + 1;
file<<var ;
return 0;
}
Results after 4 trial :
1,2 2,20 3,40 1,88
I am looking to not display the numbers . but only there updated average after each try.
Is there any way to calculate the average incrementally ?
Inside the file should exist only the average value :
For example first trial :
1.2
second trial displays the average in the file (1.2+2.2)/2
1.7
Even though it's kind of strange, what you are trying to do, and I'm sure there is a better way of doing it, here's how you can do it:
float onTheFlyAverage()
{
static int nCount=0;
float avg, newAvg;
int newNumber = getRandomNum();
nCount++; //increment the total number count
avg = readLastAvgFromFile(); //this will read the last average in your file
newAvg = avg*(nCount-1)/nCount+ (float)(newNumber)/nCount;
return newAvg;
}
If for some reason you want to keep an average in a file which you provide as input to your program and expect it to keep on averaging numbers for you (a sort of stop and continue feature), you will have to save/load the total number count in the file as well as the average.
But if you do it in one go this should work. IMHO this is far from the best way of doing it - but there you have it :)
NOTE: there is a divide by 0 corner-case I did not take care of; I leave that up to you.
You can use some simple math to calculate the mean values incrementally. However you have to count how many values contribute to the mean.
Let's say you have n numbers with a mean value of m. Your next number x contributes to the mean in the following way:
m = (mn + x)/(n+1)
It's bad for performance to divide and then multiply the average back. I suggest you store the sum and number.
(pseudocode)
// these variables are stored between function calls
int sum
int n
function float getNextRandomAverage() {
int rnd = getOneRandom()
n++;
sum += rnd;
float avg = sum/n
return avg
}
function writeNextRandomAverage() {
writeToFile(getNextRandomAverage())
}
Also it seems strange to me that your method closes the file. How does it know that it should close it? What if the file should be used later? (Say, consecutive uses of this method).
I misstook arrays for vectors, Sorry (array is vektor in swedish)
I would need some help with a program I'm making. It is a assignment so I really need to understand how I do this and not just get the code :P
I need to make a array containing 10 "numbers" (I would like to make them editable when the program is running).
After I'v done this I need to make the program calculate the "average value" of all the numbers "/
Would be pretty neat if you could pick how many numbers you wanted the average value of as well, if anyone could share some knowledge in how I should to that :P
Anyways, I'v tried some code to make the vector that didn't work, I might as well add it here:
int vector[10];
and
vector[0] "number 1: ";
and so on for the input of numbers in the vector.
int sum = vector[0] + vector[1] + ...
cout << "average value is: " << sum/5;
should work for getting the average value though (right?)
I should allso add:
float average(int v[], int n)
to this thing as well, can't really se how though.
Any help/knowledge at all would be awesome! Cheers.
To pick how many numbers you wanted to average:
Native: (G++/Clang) only, not "legal" C++
cin >> num;
int vector[num];
"Correct" native (pointers):
int *vector = new int [num];
"Proper" C++:
#include <vector>
std::vector<int> v(num);
A function like following would work for computing average of an array containing n elements.
float average(int v[], int n)
{
float sum = 0;
for(int i = 0 ; i < n ; i++)
{
sum += v[i]; //sum all the numbers in the vector v
}
return sum / n;
}
You can declare your array as you have done however i do recommend you to name it something else then vector to avoid confusion. About tour issue with changing the numbers in the array you can do this by for example maning a loop going from one to 10 and then make the user enter values for all the fields.
Vektor på svenska = array på engelska (vector är något annat :))
If you want exactly 10 numbers, you can eliminate a lot of overhead by simply using an array. However, assuming you want to use a vector, you can easily find the average taking advantage of its "size" member, as such:
float average(std::vector<int> nums)
{
int sum = 0;
for (unsigned int i = 0; i < nums.size(); i++)
sum += nums[i];
return sum / nums.size();
}
Note that this does assume the sum won't be higher than 2^31-1, IE the highest number a signed integer can represent. To be safer you could use an unsigned and/or 64 bit int for sum, or some arbitrary precision library like gmp, but I'd assume that is all outside the scope of your assignment.
You must declare and array of size 10, which you have done.
Use a loop to get ten inputs from the user.
(for or while loops would do)
Use another loop to calculate the sum of all ten numbers and store it in a variable.
Divide the variable by ten.
This is what you need to do essentially. But, to make your driver program prettier, you can define the following functions:
void GetInput(int *A); //put the input loop here
You can also write any one of the given two functions:
long Sum(int * A) //put the summing loop here
double Average(int * A) //put the summing loop here AND divide the sum by ten
Since you are a beginner I feel obliged to tell you that you don't need to return an array since it isalways passed as a reference parameter. I did not bother to pass the array size as a parameter to any functions because that is fixed and known to be 10 but it will be good practice to do that.
I've just started learning Backtracking algorithms at college. Somehow I've managed to make a program for the Subset-Sum problem. Works fine but then i discovered that my program doesn't give out all the possible combinations.
For example : There might be a hundred combinations to a target sum but my program gives only 30.
Here is the code. It would be a great help if anyone could point out what my mistake is.
int tot=0;//tot is the total sum of all the numbers in the set.
int prob[500], d, s[100], top = -1, n; // n = number of elements in the set. prob[i] is the array with the set.
void subset()
{
int i=0,sum=0; //sum - being updated at every iteration and check if it matches 'd'
while(i<n)
{
if((sum+prob[i] <= d)&&(prob[i] <= d))
{
s[++top] = i;
sum+=prob[i];
}
if(sum == d) // d is the target sum
{
show(); // this function just displays the integer array 's'
top = -1; // top points to the recent number added to the int array 's'
i = s[top+1];
sum = 0;
}
i++;
while(i == n && top!=-1)
{
sum-=prob[s[top]];
i = s[top--]+1;
}
}
}
int main()
{
cout<<"Enter number of elements : ";cin>>n;
cout<<"Enter required sum : ";cin>>d;
cout<<"Enter SET :\n";
for(int i=0;i<n;i++)
{
cin>>prob[i];
tot+=prob[i];
}
if(d <= tot)
{
subset();
}
return 0;
}
When I run the program :
Enter number of elements : 7
Enter the required sum : 12
Enter SET :
4 3 2 6 8 12 21
SOLUTION 1 : 4, 2, 6
SOLUTION 2 : 12
Although 4, 8 is also a solution, my program doesnt show it.
Its even worse with the number of inputs as 100 or more. There will be atleast 10000 combinations, but my program shows 100.
The Logic which I am trying to follow :
Take in the elements of the main SET into a subset as long as the
sum of the subset remains less than or equal to the target sum.
If the addition of a particular number to the subset sum makes it
larger than the target, it doesnt take it.
Once it reaches the end
of the set, and answer has not been found, it removes the most
recently taken number from the set and starts looking at the numbers
in the position after the position of the recent number removed.
(since what i store in the array 's' is the positions of the
selected numbers from the main SET).
The solutions you are going to find depend on the order of the entries in the set due to your "as long as" clause in step 1.
If you take entries as long as they don't get you over the target, once you've taken e.g. '4' and '2', '8' will take you over the target, so as long as '2' is in your set before '8', you'll never get a subset with '4' and '8'.
You should either add a possibility to skip adding an entry (or add it to one subset but not to another) or change the order of your set and re-examine it.
It may be that a stack-free solution is possible, but the usual (and generally easiest!) way to implement backtracking algorithms is through recursion, e.g.:
int i = 0, n; // i needs to be visible to show()
int s[100];
// Considering only the subset of prob[] values whose indexes are >= start,
// print all subsets that sum to total.
void new_subsets(int start, int total) {
if (total == 0) show(); // total == 0 means we already have a solution
// Look for the next number that could fit
while (start < n && prob[start] > total) {
++start;
}
if (start < n) {
// We found a number, prob[start], that can be added without overflow.
// Try including it by solving the subproblem that results.
s[i++] = start;
new_subsets(start + 1, total - prob[start]);
i--;
// Now try excluding it by solving the subproblem that results.
new_subsets(start + 1, total);
}
}
You would then call this from main() with new_subsets(0, d);. Recursion can be tricky to understand at first, but it's important to get your head around it -- try easier problems (e.g. generating Fibonacci numbers recursively) if the above doesn't make any sense.
Working instead with the solution you have given, one problem I can see is that as soon as you find a solution, you wipe it out and start looking for a new solution from the number to the right of the first number that was included in this solution (top = -1; i = s[top+1]; implies i = s[0], and there is a subsequent i++;). This will miss solutions that begin with the same first number. You should just do if (sum == d) { show(); } instead, to make sure you get them all.
I initially found your inner while loop pretty confusing, but I think it's actually doing the right thing: once i hits the end of the array, it will delete the last number added to the partial solution, and if this number was the last number in the array, it will loop again to delete the second-to-last number from the partial solution. It can never loop more than twice because numbers included in a partial solution are all at distinct positions.
I haven't analysed the algorithm in detail, but what struck me is that your algorithm doesn't account for the possibility that, after having one solution that starts with number X, there could be multiple solutions starting with that number.
A first improvement would be to avoid resetting your stack s and the running sum after you printed the solution.
Given a list of numbers in increasing order and a certain sum, I'm trying to implement the optimal way of finding the sum. Using the biggest number first
A sample input would be:
3
1
2
5
11
where the first line the number of numbers we are using and the last line is the desired sum
the output would be:
1 x 1
2 x 5
which equals 11
I'm trying to interpret this https://www.classle.net/book/c-program-making-change-using-greedy-method using stdard input
Here is what i got so far
#include <iostream>
using namespace std;
int main()
{
int sol = 0; int array[]; int m[10];
while (!cin.eof())
{
cin >> array[i]; // add inputs to an array
i++;
}
x = array[0]; // number of
for (int i; i < x ; i++) {
while(sol<array[x+1]){
// try to check all multiplications of the largest number until its over the sum
// save the multiplication number into the m[] before it goes over the sum;
//then do the same with the second highest number and check if they can add up to sum
}
cout << m[//multiplication number] << "x" << array[//correct index]
return 0;
}
if(sol!=array[x+1])
{
cout<<endl<<"Not Possible!";
}
}
Finding it hard to find an efficient way of doing this in terms of trying all possible combinations starting with the biggest number? Any suggestions would be greatly helpful, since i know im clearly off
The problem is a variation of the subset sum problem, which is NP-Hard.
An NP-Hard problem is a problem that (among other things) - there is no known polynomial solution for it, thus the greedy approach of "getting the highest first" fails for it.
However, for this NP-Hard problem, there is a pseudo-polynomial solution using dynamic programming. The problem where you can chose each number more then once is called the con change problem.
This page contains explanation and possible solutions for the problem.