I saw this code from the Stroustrup's book, but I can't understand how it works.
I just can't get how it increases by " 0, 1, 4, 9..."
int archaic_square(int v) {
int total = 0;
for (int i = 0; i < v; ++i) {
total += v;
}
return total;
}
int main() {
for (int i = 0; i < 100; ++i) {
cout << i << '\t' << archaic_square(i) << '\n';
}
return 0;
}
The code in archaic_square is starting total off as zero, then adding v to it v times (in the loop).
By definition, it will then end up as:
0 + v + v + … + v
\___________/
v times
which is 0 + v * v, or v2.
In more explicit detail:
adding zero to zero, zero times, gives you zero (0);
adding one to zero, once, gives you one (0, 1);
adding two to zero, two times, gives you four (0, 2, 4);
adding three to zero, three times, gives you nine (0, 3, 6, 9);
adding four to zero, four times, gives you sixteen (0, 4, 8, 12, 16);
and so on, ad infinitum.
Remember from arithmetic that multiplication is repeated just a addition (or rather a repeated addition by definition)? That's all that's happening here.
Since v is getting added v times, it is the same as v * v, or v squared.
That code calculates squares by the method of differences. It's an alternative way of evaluating functions, and has some benefits over the usual plug-in-the-values approach. It was used in Babbage's difference engine, which was designed in the early 1800s to calculate mathematical tables for logarithms, trig functions, etc. to 40 (!) digits.
The underlying idea is that you begin with a list of values that you know how to evaluate. You subtract each value from its neighboring value, giving you a list of first differences. Then you subtract each difference value from its neighboring difference value, giving you a list of second differences. Continue until you reach a level where all the difference values are equal. For a second-order polynomial (such as x^2) the second differences will all be equal. For a third-order polynomial, the third differences will all be equal. And so on.
So for calculating squares, you end up with this:
Value First Difference Second Difference
0
1 1
4 3 2
9 5 2
16 7 2
25 9 2
Now you can reverse the process. Start with a result of 0. Add the first difference (1), giving the next result (1). Then increase the first difference by the second difference (2), giving the next first difference (3); add that to the previous result (1), giving the next result (4). Then increase the new first difference (3) by the second difference (2), giving the next first difference (5); add that to the previous result (4), giving the next result (9). Continue until done.
So with only the first value (0), the first difference (1), and the (constant) second difference (2), we can generate as long a list of squares as we would like.
When you need a list of results, calculating them one after another like this replaces multiplication with addition, which, back in the olden days, was much faster. Further, if a computer (back when a computer was a person who did tedious calculations to produce mathematical) made a mistake, all the results after that mistake would be wrong, too, so the mathematician in charge of the project didn't have to provide for checking every result; spot checking was sufficient.
Calculating trig functions, of course, is a bit more tricky, because they aren't defined by polynomials. But over a small enough region they can be approximated by a polynomial.
Babbage's engine would have calculated 40 digit values, with up to 7 levels of differences. It mechanically went through the sequence of steps I mentioned above, grinding out results at a rate of one every few seconds. Babbage didn't actually build the full difference engine; he got an inspiration for a much more powerful "Analytical engine" which he also never built. It would have been a precursor to modern digital computers, with 1000 40-digit storage units, an arithmetic processor, and punched cards to control the sequence of operations.
Related
From a given array (call it numbers[]), i want another array (results[]) which contains all sum possibilities between elements of the first array.
For example, if I have numbers[] = {1,3,5}, results[] will be {1,3,5,4,8,6,9,0}.
there are 2^n possibilities.
It doesn't matter if a number appears two times because results[] will be a set
I did it for sum of pairs or triplet, and it's very easy. But I don't understand how it works when we sum 0, 1, 2 or n numbers.
This is what I did for pairs :
std::unordered_set<int> pairPossibilities(std::vector<int> &numbers) {
std::unordered_set<int> results;
for(int i=0;i<numbers.size()-1;i++) {
for(int j=i+1;j<numbers.size();j++) {
results.insert(numbers.at(i)+numbers.at(j));
}
}
return results;
}
Also, assuming that the numbers[] is sorted, is there any possibility to sort results[] while we fill it ?
Thanks!
This can be done with Dynamic Programming (DP) in O(n*W) where W = sum{numbers}.
This is basically the same solution of Subset Sum Problem, exploiting the fact that the problem has optimal substructure.
DP[i, 0] = true
DP[-1, w] = false w != 0
DP[i, w] = DP[i-1, w] OR DP[i-1, w - numbers[i]]
Start by following the above solution to find DP[n, sum{numbers}].
As a result, you will get:
DP[n , w] = true if and only if w can be constructed from numbers
Following on from the Dynamic Programming answer, You could go with a recursive solution, and then use memoization to cache the results, top-down approach in contrast to Amit's bottom-up.
vector<int> subsetSum(vector<int>& nums)
{
vector<int> ans;
generateSubsetSum(ans,0,nums,0);
return ans;
}
void generateSubsetSum(vector<int>& ans, int sum, vector<int>& nums, int i)
{
if(i == nums.size() )
{
ans.push_back(sum);
return;
}
generateSubsetSum(ans,sum + nums[i],nums,i + 1);
generateSubsetSum(ans,sum,nums,i + 1);
}
Result is : {9 4 6 1 8 3 5 0} for the set {1,3,5}
This simply picks the first number at the first index i adds it to the sum and recurses. Once it returns, the second branch follows, sum, without the nums[i] added. To memoize this you would have a cache to store sum at i.
I would do something like this (seems easier) [I wanted to put this in comment but can't write the shifting and removing an elem at a time - you might need a linked list]
1 3 5
3 5
-----
4 8
1 3 5
5
-----
6
1 3 5
3 5
5
------
9
Add 0 to the list in the end.
Another way to solve this is create a subset arrays of vector of elements then sum up each array's vector's data.
e.g
1 3 5 = {1, 3} + {1,5} + {3,5} + {1,3,5} after removing sets of single element.
Keep in mind that it is always easier said than done. A single tiny mistake along the implemented algorithm would take a lot of time in debug to find it out. =]]
There has to be a binary chop version, as well. This one is a bit heavy-handed and relies on that set of answers you mention to filter repeated results:
Split the list into 2,
and generate the list of sums for each half
by recursion:
the minimum state is either
2 entries, with 1 result,
or 3 entries with 3 results
alternatively, take it down to 1 entry with 0 results, if you insist
Then combine the 2 halves:
All the returned entries from both halves are legitimate results
There are 4 additional result sets to add to the output result by combining:
The first half inputs vs the second half inputs
The first half outputs vs the second half inputs
The first half inputs vs the second half outputs
The first half outputs vs the second half outputs
Note that the outputs of the two halves may have some elements in common, but they should be treated separately for these combines.
The inputs can be scrubbed from the returned outputs of each recursion if the inputs are legitimate final results. If they are they can either be added back in at the top-level stage or returned by the bottom level stage and not considered again in the combining.
You could use a bitfield instead of a set to filter out the duplicates. There are reasonably efficient ways of stepping through a bitfield to find all the set bits. The max size of the bitfield is the sum of all the inputs.
There is no intelligence here, but lots of opportunity for parallel processing within the recursion and combine steps.
Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
{
int n, i, sum, j;
scanf("%d", &n);
sum=0;
for(i=0;i<n;i++)
scanf("%d", &num[i]);
for(i=0;i<n-1;i++)
{
for(j=i+1;j<n;j++)
{
sum+=(num[i]^num[j]);
}
}
printf("%d\n", sum);
return 0;
}
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
{
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
}
return res;
}
Live Demo.
Explanation:
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).
What is the fastest way to find the sum of decimal digits?
The following code is what I wrote but it is very very slow for range 1 to 1000000000000000000
long long sum_of_digits(long long input) {
long long total = 0;
while (input != 0) {
total += input % 10;
input /= 10;
}
return total;
}
int main ( int argc, char** argv) {
for ( long long i = 1L; i <= 1000000000000000000L; i++) {
sum_of_digits(i);
}
return 0;
}
I'm assuming what you are trying to do is along the lines of
#include <iostream>
const long long limit = 1000000000000000000LL;
int main () {
long long grand_total = 0;
for (long long ii = 1; ii <= limit; ++ii) {
grand_total += sum_of_digits(i);
}
std::cout << "Grand total = " << grand_total << "\n";
return 0;
}
This won't work for two reasons:
It will take a long long time.
It will overflow.
To deal with the overflow problem, you will either have to put a bound on your upper limit or use some bignum package. I'll leave solving that problem up to you.
To deal with the computational burden you need to get creative. If you know the upper limit is limited to powers of 10 this is fairly easy. If the upper limit can be some arbitrary number you will have to get a bit more creative.
First look at the problem of computing the sum of digits of all integers from 0 to 10n-1 (e.g., 0 to 9 (n=1), 0 to 99 (n=2), etc.) Denote the sum of digits of all integers from 10n-1 as Sn. For n=1 (0 to 9), this is just 0+1+2+3+4+5+6+7+8+9=45 (9*10/2). Thus S1=45.
For n=2 (0 to 99), you are summing 0-9 ten times and you are summing 0-9 ten times again. For n=3 (0 to 999), you are summing 0-99 ten times and you are summing 0-9 100 times. For n=4 (0 to 9999), you are summing 0-999 ten times and you are summing 0-9 1000 times. In general, Sn=10Sn-1+10n-1S1 as a recursive expression. This simplifies to Sn=(9n10n)/2.
If the upper limit is of the form 10n, the solution is the above Sn plus one more for the number 1000...000. If the upper limit is an arbitrary number you will need to get creative once again. Think along the lines that went into developing the formula for Sn.
You can break this down recursively. The sum of the digits of an 18-digit number are the sums of the first 9 digits plus the last 9 digits. Likewise the sum of the digits of a 9-bit number will be the sum of the first 4 or 5 digits plus the sum of the last 5 or 4 digits. Naturally you can special-case when the value is 0.
Reading your edit: computing that function in a loop for i between 1 and 1000000000000000000 takes a long time. This is a no brainer.
1000000000000000000 is one billion billion. Your processor will be able to do at best billions of operations per second. Even with a nonexistant 4-5 Ghz processor, and assuming best case it compiles down to an add, a mod, a div, and a compare jump, you could only do 1 billion iterations per second, meaning it will take on the order of 1 billion seconds.
You probably don't want to do it in a bruteforce way. This seems to be more of a logical thinking question.
Note - 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = N(N+1)/2 = 45.
---- Changing the answer to make it clearer after David's comment
See David's answer - I had it wrong
Quite late to the party, but anyways, here is my solution. Sorry it's in Python and not C++, but it should be relatively easy to translate. And because this is primarily an algorithm problem, I hope that's ok.
As for the overflow problem, the only thing that comes to mind is to use arrays of digits instead of actual numbers. Given this algorithm I hope it won't affect performance too much.
https://gist.github.com/frnhr/7608873
It uses these three recursions I found by looking and poking at the problem. Rather then trying to come up with some general and arcane equations, here are three examples. A general case should be easily visible from those.
relation 1
Reduces function calls with arbitrary argument to several recursive calls with more predictable arguments for use in relations 2 and 3.
foo(3456) == foo(3000)
+ foo(400) + 400 * (3)
+ foo(50) + 50 * (3 + 4)
+ foo(6) + 6 * (3 + 4 + 5)
relation 2
Reduce calls with an argument in the form L*10^M (e.g: 30, 7000, 900000) to recursive call usable for relation 3. These triangular numbers popped in quite uninvited (but welcome) :)
triangular_numbers = [0, 1, 3, 6, 10, 15, 21, 28, 36] # 0 not used
foo(3000) == 3 * foo(1000) + triangular_numbers[3 - 1] * 1000
Only useful if L > 1. It holds true for L = 1 but is trivial. In that case, go directly to relation 3.
relation 3
Recursively reduce calls with argument in format 1*10^M to a call with argument that's divided by 10.
foo(1000) == foo(100) * 10 + 44 * 100 + 100 - 9 # 44 and 9 are constants
Ultimately you only have to really calculate the sum or digits for numbers 0 to 10, and it turns out than only up to 3 of these calculations are needed. Everything else is taken care of with this recursion. I'm pretty sure it runs in O(logN) time. That's FAAST!!!!!11one
On my laptop it calculates the sum of digit sums for a given number with over 1300 digits in under 7 seconds! Your test (1000000000000000000) gets calculated in 0.000112057 seconds!
I think you cannot do better than O(N) where N is the number of digits in the given number(which is not computationally expensive)
However if I understood your question correctly (the range) you want to output the sum of digits for a range of numbers. In that case, you can increment by one when you go from number0 to number9 and then decrease by 8.
You will need to cheat - look for mathematical patterns that let you short-cut your computations.
For example, do you really need to test that input != 0 every time? Does it matter if you add 0/10 several times? Since it won't matter, consider unrolling the loop.
Can you do the calculation in a larger base, eg, base 10^2, 10^3, etcetera, that might allow you to reduce the number of digits, which you'll then have to convert back to base 10? If this works, you'll be able to implement a cache more easily.
Consider looking at compiler intrinsics that let you give hints to the compiler for branch prediction.
Given that this is C++, consider implementing this using template metaprogramming.
Given that sum_of_digits is purely functional, consider caching the results.
Now, most of those suggestions will backfire - but the point I'm making is that if you have hit the limits of what your computer can do for a given algorithm, you do need to find a different solution.
This is probably an excellent starting point if you want to investigate this in detail: http://mathworld.wolfram.com/DigitSum.html
Possibility 1:
You could make it faster by feeding the result of one iteration of the loop into the next iteration.
For example, if i == 365, the result is 14. In the next loop, i == 366 -- 1 more than the previous result. The sum is also 1 more: 3 + 6 + 6 = 15.
Problems arise when there is a carry digit. If i == 99 (ie. result = 18), the next loop's result isn't 19, it's 1. You'll need extra code to detect this case.
Possibility 2:
While thinking though the above, it occurred to me that the sequence of results from sum_of_digits when graphed would resemble a sawtooth. With some analysis of the resulting graph (which I leave as an exercise for the reader), it may be possible to identify a method to allow direct calculation of the sum result.
However, as some others have pointed out: Even with the fastest possible implementation of sum_of_digits and the most optimised loop code, you can't possibly calculate 1000000000000000000 results in any useful timeframe, and certainly not in less than one second.
Edit: It seems you want the the sum of the actual digits such that: 12345 = 1+2+3+4+5 not the count of digits, nor the sum of all numbers 1 to 12345 (inclusive);
As such the fastest you can get is:
long long sum_of_digits(long long input) {
long long total = input % 10;
while ((input /= 10) != 0)
total += input % 10;
return total;
}
Which is still going to be slow when you're running enough iterations. Your requirement of 1,000,000,000,000,000,000L iterations is One Million, Million, Million. Given 100 Million takes around 10,000ms on my computer, one can expect that it will take 100ms per 1 million records, and you want to do that another million million times. There are only 86400 seconds in a day, so at best we can compute around 86,400 Million records per day. It would take one computer
Lets suppose your method could be performed in a single float operation (somehow), suppose you are using the K computer which is currently the fastest (Rmax) supercomputer at over 10 petaflops, if you do the math that is = 10,000 Million Million floating operations per second. This means that your 1 Million, Million, Million loop will take the world's fastest non-distributed supercomputer 100 seconds to compute the sums (IF it took 1 float operation to calculate, which it can't), so you will need to wait around for quite some time for computers to become 100 so much more powerful for your solution to be runable in under one second.
What ever you're trying to do, you're either trying to do an unsolvable problem in near real-time (eg: graphics calculation related) or you misunderstand the question / task that was given you, or you are expected to perform something faster than any (non-distributed) computer system can do.
If your task is actually to sum all the digits of a range as you show and then output them, the answer is not to improve the for loop. for example:
1 = 0
10 = 46
100 = 901
1000 = 13501
10000 = 180001
100000 = 2250001
1000000 = 27000001
10000000 = 315000001
100000000 = 3600000001
From this you could work out a formula to actually compute the total sum of all digits for all numbers from 1 to N. But it's not clear what you really want, beyond a much faster computer.
No the best, but simple:
int DigitSumRange(int a, int b) {
int s = 0;
for (; a <= b; a++)
for(c : to_string(a))
s += c-48;
return s;
}
A Python function is given below, which converts the number to a string and then to a list of digits and then finds the sum of these digits.
def SumDigits(n):
ns=list(str(n))
z=[int(d) for d in ns]
return(sum(z))
In C++ one of the fastest way can be using strings.
first of all get the input from users in a string. Then add each element of string after converting it into int. It can be done using -> (str[i] - '0').
#include<iostream>
#include<string>
using namespace std;
int main()
{ string str;
cin>>str;
long long int sum=0;
for(long long int i=0;i<str.length();i++){
sum = sum + (str[i]-'0');
}
cout<<sum;
}
The formula for finding the sum of the digits of numbers between 1 to N is:
(1 + N)*(N/2)
[http://mathforum.org/library/drmath/view/57919.html][1]
There is a class written in C# which supports a number with more than the supported max-limit of long.
You can find it here. [Oyster.Math][2]
Using this class, I have generated a block of code in c#, may be its of some help to you.
using Oyster.Math;
class Program
{
private static DateTime startDate;
static void Main(string[] args)
{
startDate = DateTime.Now;
Console.WriteLine("Finding Sum of digits from {0} to {1}", 1L, 1000000000000000000L);
sum_of_digits(1000000000000000000L);
Console.WriteLine("Time Taken for the process: {0},", DateTime.Now - startDate);
Console.ReadLine();
}
private static void sum_of_digits(long input)
{
var answer = IntX.Multiply(IntX.Parse(Convert.ToString(1 + input)), IntX.Parse(Convert.ToString(input / 2)), MultiplyMode.Classic);
Console.WriteLine("Sum: {0}", answer);
}
}
Please ignore this comment if it is not relevant for your context.
[1]: https://web.archive.org/web/20171225182632/http://mathforum.org/library/drmath/view/57919.html
[2]: https://web.archive.org/web/20171223050751/http://intx.codeplex.com/
If you want to find the sum for the range say 1 to N then simply do the following
long sum = N(N+1)/2;
it is the fastest way.
What/where are the practical uses of the partial_sum algorithm in STL?
What are some other interesting/non-trivial examples or use-cases?
I used it to reduce memory usage of a simple mark-sweep garbage collector in my toy lambda calculus interpreter.
The GC pool is an array of objects of identical size. The goal is to eliminate objects that aren't linked to other objects, and condense the remaining objects into the beginning of the array. Since the objects are moved in memory, each link needs to be updated. This necessitates an object remapping table.
partial_sum allows the table to be stored in compressed format (as little as one bit per object) until the sweep is complete and memory has been freed. Since the objects are small, this significantly reduces memory use.
Recursively mark used objects and populate the Boolean array.
Use remove_if to condense the marked objects to the beginning of the pool.
Use partial_sum over the Boolean values to generate a table of pointers/indexes into the new pool.
This works because the Nth marked object has N preceding 1's in the array and acquires pool index N.
Sweep over the pool again and replace each link using the remap table.
It's especially friendly to the data cache to put the remap table in the just-freed, thus still hot, memory.
One thing to note about partial sum is that it is the operation that undoes adjacent difference much like - undoes +. Or better yet if you remember calculus the way differentiation undoes integration. Better because adjacent difference is essentially differentiation and partial sum is integration.
Let's say you have simulation of a car and at each time step you need to know the position, velocity, and acceleration. You only need to store one of those values as you can compute the other two. Say you store the position at each time step you can take the adjacent difference of the position to give the velocity and the adjacent difference of the velocity to give the acceleration. Alternatively, if you store the acceleration you can take the partial sum to give the velocity and the partial sum of the velocity gives the position.
Partial sum is one of those functions that doesn't come up too often for most people but is enormously useful when you find the right situation. A lot like calculus.
Last time I (would have) used it is when converting a discrete probability distribution (an array of p(X = k)) into a cumulative distribution (an array of p(X <= k)). To select once from the distribution, you can pick a number from [0-1) randomly, then binary search into the cumulative distribution.
That code wasn't in C++, though, so I did the partial sum myself.
You can use it to generate a monotonically increasing sequence of numbers. For example, the following generates a vector containing the numbers 1 through 42:
std::vector<int> v(42, 1);
std::partial_sum(v.begin(), v.end(), v.begin());
Is this an everyday use case? Probably not, though I've found it useful on several occasions.
You can also use std::partial_sum to generate a list of factorials. (This is even less useful, though, since the number of factorials that can be represented by a typical integer data type is quite limited. It is fun, though :-D)
std::vector<int> v(10, 1);
std::partial_sum(v.begin(), v.end(), v.begin());
std::partial_sum(v.begin(), v.end(), v.begin(), std::multiplies<int>());
Personal Use Case: Roulette-Wheel-Selection
I'm using partial_sum in a roulette-wheel-selection algorithm (link text). This algorithm choses randomly elements from a container with a probability which is linear to some value given beforehands.
Because all my elements to choose from bringing a not-necessarily normalized value, I use the partial_sum algorithm for constructing something like a "roulette-wheel", because I sum up all the elements. Then I chose a random variable in this range (the last partial_sum is the sum of all) and use stl::lower_bound for searching "the wheel" where my random search landed. The element returned by the lower_bound algorithm is the chosen one.
Besides the advantage of clear and expressive code with the use of partial_sum, I could also gain some speed when experimenting with the GCC parallel mode which brings parallelized versions for some algorithms and one of them is the partial_sum (link text).
Another use I know of: One of the most important algorithmic primitives in parallel processing (but maybe a little bit away from STL)
If you're interested in heavy optimized algorithms which are using partial_sum (in this case maybe more results under the synonyms "scan" or "prefix_sum"), than go to the parallel algorithms community. They need it all the time. You won't find a parallel sorting algorithm based on quicksort or mergesort without using it. This operation is one of the most important parallel primitives used. I think it is most commonly used for calculating offsets in dynamic algorithms. Think of a partition step in quicksort, which is split and fed to the parallel threads. You don't know the number of elements in each slot of the partition before calculating it. So you need some offsets for all the threads for later access.
Maybe you will find more informatin in the now-hot topic of GPU processing. One short article regarding Nvidia's CUDA and the scan-primitive with a few application examples you will find in Chapter 39. Parallel Prefix Sum (Scan) with CUDA.
Personal Use Case: intermediate step in counting sort from CLRS:
COUNTING_SORT (A, B, k)
for i ← 1 to k do
c[i] ← 0
for j ← 1 to n do
c[A[j]] ← c[A[j]] + 1
//c[i] now contains the number of elements equal to i
// std::partial_sum here
for i ← 2 to k do
c[i] ← c[i] + c[i-1]
// c[i] now contains the number of elements ≤ i
for j ← n downto 1 do
B[c[A[i]]] ← A[j]
c[A[i]] ← c[A[j]] - 1
You could build a "moving sum" (precursor to a moving average):
template <class T>
void moving_sum (const vector<T>& in, int num, vector<T>& out)
{
// cummulative sum
partial_sum (in.begin(), in.end(), out.begin());
// shift and subtract
int j;
for (int i = out.size() - 1; i >= 0; i--) {
j = i - num;
if (j >= 0)
out[i] -= out[j];
}
}
And then call it with:
vector<double> v(10);
// fill in v
vector<double> v2 (v.size());
moving_sum (v, 3, v2);
You know, I actually did use partial_sum() once... It was this interesting little problem that I was asked on a job interview. I enjoyed it so much, I went home and coded it up.
The problem was: Given a sequential sequence of integers, find the shortest sub-sequence with the highest value. E.g. Given:
Value: -1 2 3 -1 4 -2 -4 5
Index: 0 1 2 3 4 5 6 7
We would find the subsequence [1,4]
Now the obvious solution is to run with 3 for loops, iterating over all possible starts & ends, and adding up the value of each possible subsequence in turn. Inefficient, but quick to code up and hard to make mistakes. (Especially when the third for loop is just accumulate(start,end,0).)
The correct solution involves a divide-and-conquer / bottom up approach. E.g. Divide the problem space in half, and for each half compute the largest subsequence contained within that section, the largest subsequence including the starting number, the largest subsequence including the ending number, and the entire section's subsequence. Armed with this data we can then combine the two halves together without any further evaluation of either one. Obviously the data for each half can be computed by further breaking each half into halves (quarters), each quarter into halves (eighths), and so on until we have trivial singleton cases. It's all quite efficient.
But all that aside, there's a third (somewhat less efficient) option that I wanted to explore. It's similar to the 3-for-loop case, only we add the adjacent numbers to avoid so much work. The idea is that there's no need to add a+b, a+b+c, and a+b+c+d when we can add t1=a+b, t2=t1+c, and t3=t2+d. It's a space/computation tradeoff thing. It works by transforming the sequence:
Index: 0 1 2 3 4
FROM: 1 2 3 4 5
TO: 1 3 6 10 15
Thereby giving us all possible substrings starting at index=0 and ending at indexes=0,1,2,3,4.
Then we iterate over this set subtracting the successive possible "start" points...
FROM: 1 3 6 10 15
TO: - 2 5 9 14
TO: - - 3 7 12
TO: - - - 4 9
TO: - - - - 5
Thereby giving us the values (sums) of all possible subsequences.
We can find the maximum value of each iteration via max_element().
The first step is most easily accomplished via partial_sum().
The remaining steps via a for loop and transform(data+i,data+size,data+i,bind2nd(minus<TYPE>(),data[i-1])).
Clearly O(N^2). But still interesting and fun...
Partial sums are often useful in parallel algorithms. Consider the code
for (int i=0; N>i; ++i) {
sum += x[i];
do_something(sum);
}
If you want to parallelise this code, you need to know the partial sums. I am using GNUs parallel version of partial_sum for something very similar.
I often use partial sum not to sum but to calculate the current value in the sequence depending on the previous.
For example, if you integrate a function. Each new step is a previous step, vt += dvdt or vt = integrate_step(dvdt, t_prev, t_prev+dt);.
In nonparametric Bayesian methods there is a Metropolis-Hastings step (per observation) that determines to sample a new or an existing cluster. If an existing cluster has to be sampled this needs to be done with different weights. These weighted likelihoods are simulated in the following example code.
#include <random>
#include <iostream>
#include <algorithm>
int main() {
std::default_random_engine generator(std::random_device{}());
std::uniform_real_distribution<double> distribution(0.0,1.0);
int K = 8;
std::vector<double> weighted_likelihood(K);
for (int i = 0; i < K; ++i) {
weighted_likelihood[i] = i*10;
}
std::cout << "Weighted likelihood: ";
for (auto i: weighted_likelihood) std::cout << i << ' ';
std::cout << std::endl;
std::vector<double> cumsum_likelihood(K);
std::partial_sum(weighted_likelihood.begin(), weighted_likelihood.end(), cumsum_likelihood.begin());
std::cout << "Cumulative sum of weighted likelihood: ";
for (auto i: cumsum_likelihood) std::cout << i << ' ';
std::cout << std::endl;
std::vector<int> frequency(K);
int N = 280000;
for (int i = 0; i < N; ++i) {
double pick = distribution(generator) * cumsum_likelihood.back();
auto lower = std::lower_bound(cumsum_likelihood.begin(), cumsum_likelihood.end(), pick);
int index = std::distance(cumsum_likelihood.begin(), lower);
frequency[index]++;
}
std::cout << "Frequencies: ";
for (auto i: frequency) std::cout << i << ' ';
std::cout << std::endl;
}
Note that this is not different from the answer by https://stackoverflow.com/users/13005/steve-jessop. It's added to give a bit more context about a particular situation (nonparametric Bayesian mehods, e.g. the algorithms by Neal using the Dirichlet process as prior) and the actual code which uses partial_sum in combination with lower_bound.
I am trying to write a C++ program that works like the game 24. For those who don't know how it is played, basically you try to find any way that 4 numbers can total 24 through the four algebraic operators of +, -, /, *, and parenthesis.
As an example, say someone inputs 2,3,1,5
((2+3)*5) - 1 = 24
It was relatively simple to code the function to determine if three numbers can make 24 because of the limited number of positions for parenthesis, but I can not figure how code it efficiently when four variables are entered.
I have some permutations working now but I still cannot enumerate all cases because I don't know how to code for the cases where the operations are the same.
Also, what is the easiest way to calculate the RPN? I came across many pages such as this one:
http://www.dreamincode.net/forums/index.php?showtopic=15406
but as a beginner, I am not sure how to implement it.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
bool MakeSum(int num1, int num2, int num3, int num4)
{
vector<int> vi;
vi.push_back(num1);
vi.push_back(num2);
vi.push_back(num3);
vi.push_back(num4);
sort(vi.begin(),vi.end());
char a1 = '+';
char a2 = '-';
char a3 = '*';
char a4 = '/';
vector<char> va;
va.push_back(a1);
va.push_back(a2);
va.push_back(a3);
va.push_back(a4);
sort(va.begin(),va.end());
while(next_permutation(vi.begin(),vi.end()))
{
while(next_permutation(va.begin(),va.end()))
{
cout<<vi[0]<<vi[1]<<vi[2]<<vi[3]<< va[0]<<va[1]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<vi[2]<<va[0]<< vi[3]<<va[1]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<vi[2]<<va[0]<< va[1]<<vi[3]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<va[0]<<vi[2]<< vi[3]<<va[1]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<va[0]<<vi[2]<< va[1]<<vi[3]<<va[2]<<endl;
}
}
return 0;
}
int main()
{
MakeSum(5,7,2,1);
return 0;
}
So, the simple way is to permute through all possible combinations. This is slightly tricky, the order of the numbers can be important, and certainly the order of operations is.
One observation is that you are trying to generate all possible expression trees with certain properties. One property is that the tree will always have exactly 4 leaves. This means the tree will also always have exactly 3 internal nodes. There are only 3 possible shapes for such a tree:
A
/ \
N A
/ \ (and the mirror image)
N A
/ \
N N
A
/ \
N A
/ \
A N (and the mirror image)
/ \
N N
A
/` `\
A A
/ \ / \
N N N N
In each spot for A you can have any one of the 4 operations. In each spot for N you can have any one of the numbers. But each number can only appear for one N.
Coding this as a brute force search shouldn't be too hard, and I think that after you have things done this way it will become easier to think about optimizations.
For example, + and * are commutative. This means that mirrors that flip the left and right children of those operations will have no effect. It might be possible to cut down searching through all such flips.
Someone else mentioned RPN notation. The trees directly map to this. Here is a list of all possible trees in RPN:
N N N N A A A
N N N A N A A
N N N A A N A
N N A N N A A
N N A N A N A
That's 4*3*2 = 24 possibilities for numbers, 4*4*4 = 64 possibilities for operations, 24 * 64 * 5 = 7680 total possibilities for a given set of 4 numbers. Easily countable and can be evaluated in a tiny fraction of a second on a modern system. Heck, even in basic on my old Atari 8 bit I bet this problem would only take minutes for a given group of 4 numbers.
You can just use Reverse Polish Notation to generate the possible expressions, which should remove the need for parantheses.
An absolutely naive way to do this would be to generate all possible strings of 4 digits and 3 operators (paying no heed to validity as an RPN), assume it is in RPN and try to evaluate it. You will hit some error cases (as in invalid RPN strings). The total number of possibilities (if I calculated correctly) is ~50,000.
A more clever way should get it down to ~7500 I believe (64*24*5 to be exact): Generate a permutation of the digits (24 ways), generate a triplet of 3 operators (4^3 = 64 ways) and now place the operators among the digits to make it valid RPN(there are 5 ways, see Omnifarious' answer).
You should be able to find permutation generators and RPN calculators easily on the web.
Hope that helps!
PS: Just FYI: RPN is nothing but the postorder traversal of the corresponding expression tree, and for d digits, the number is d! * 4^(d-1) * Choose(2(d-1), (d-1))/d. (The last term is a catalan number).
Edited: The solution below is wrong. We also need to consider the numbers makeable with just x_2 and x_4, and with just x_1 and x_4. This approach can still work, but it's going to be rather more complex (and even less efficient). Sorry...
Suppose we have four numbers x_1, x_2, x_3, x_4. Write
S = { all numbers we can make just using x_3, x_4 },
Then we can rewrite the set we're interested in, which I'll call
T = { all numbers we can make using x_1, x_2, x_3, x_4 }
as
T = { all numbers we can make using x_1, x_2 and some s from S }.
So an algorithm is to generate all possible numbers in S, then use each number s in S in turn to generate part of T. (This will generalise fairly easily to n numbers instead of just 4).
Here's a rough, untested code example:
#include <set> // we can use std::set to store integers without duplication
#include <vector> // we might want duplication in the inputs
// the 2-number special case
std::set<int> all_combinations_from_pair(int a, int b)
{
std::set results;
// here we just use brute force
results.insert(a+b); // = b+a
results.insert(a-b);
results.insert(b-a);
results.insert(a*b); // = b*a
// need to make sure it divides exactly
if (a%b==0) results.insert(a/b);
if (b%a==0) results.insert(b/a);
return results;
}
// the general case
std::set<int> all_combinations_from(std::vector<int> inputs)
{
if (inputs.size() == 2)
{
return all_combinations_from_pair(inputs[0], inputs[1]);
}
else
{
std::set<int> S = all_combinations_from_pair(inputs[0], inputs[1]);
std::set<int> T;
std::set<int> rest = S;
rest.remove(rest.begin());
rest.remove(rest.begin()); // gets rid of first two
for (std::set<int>.iterator i = S.begin(); i < S.end(); i++)
{
std::set<int> new_inputs = S;
new_inputs.insert(*i);
std::set<int> new_outputs = all_combinations_from(new_inputs);
for (std::set<int>.iterator j = new_outputs.begin(); j < new_outputs.end(); j++)
T.insert(*j); // I'm sure you can do this with set_union()
}
return T;
}
}
If you are allowed to use the same operator twice, you probably don't want to mix the operators into the numbers. Instead, perhaps use three 0's as a placeholder for where operations will occur (none of the 4 numbers are 0, right?) and use another structure to determine which operations will be used.
The second structure could be a vector<int> initialized with three 1's followed by three 0's. The 0's correspond to the 0's in the number vector. If a 0 is preceded by zero 1's, the corresponding operation is +, if preceded by one 1, it's -, etc. For example:
6807900 <= equation of form ( 6 # 8 ) # ( 7 # 9 )
100110 <= replace #'s with (-,-,/)
possibility is (6-8)-(7/9)
Advance through the operation possibilities using next_permutation in an inner loop.
By the way, you can also return early if the number-permutation is an invalid postfix expression. All permutations of the above example less than 6708090 are invalid, and all greater are valid, so you could start with 9876000 and work your way down with prev_permutation.
Look up the Knapsack problem (here's a link to get you started: http://en.wikipedia.org/wiki/Knapsack_problem), this problem is pretty close to that, just a little harder (and the Knapsack problem is NP-complete!)
One thing that might make this faster than normal is parallelisation. Check out OpenMP. Using this, more than one check is carried out at once (your "alg" function) thus if you have a dual/quad core cpu, your program should be faster.
That said, if as suggested above the problem is NP-complete, it'll be faster, not necessarily fast.
i wrote something like this before. You need a recursive evaluator. Call evaluate, when you hit "(" call evaluate again otherwise run along with digits and operators till you hit ")", now return the result of the -+*/ operations the the evaluate instance above you