Finding pair values in a given range - c++

I have an array or N pairs (v1, v2) where v1 <= v2. These are supposed to represent events in time that start at v1 and end at v2. they can be equal, then the event is instantaneous. The array is sorted by starting time, v1.
For a given range (L, R), I would like to find any pair where L <= v1 <= R or L <= v2 <= R. The idea here is to get events starting, happening or ending in the given range.
My main problem is efficiency. The array could contains hundreds of thousands of events. So just a linear search going through all pairs is not an option.
I read a bit about kd-tree but the problem with it is that it excludes the boundaries of the range and would only return L <= v1 <= R AND L <= v2 <= R. That is, would only return events that actually happen (start AND end) in the range whereas I need start OR end (or both obviously).
I thought also about keeping 2 lookup tables (I use double for time)
std::map<double, Event*> startPoints;
std::map<double, Event*> endPoints;
and the use the std::find algorithm in both of them and merge the results.
Just looking for an advise, wether it's a good solution or if there is a more clever way.
EDIT:
Re-thinking about that, It is more complicated. Here is an example of the expected results
L < R : Range is large enough
|---Ev1---| |---Ev3---| |---Ev5---|
|---Ev2---| |---Ev4---|
| |
L R
Here I would like to get Ev2 (which is ending in the range), Ev3 (Which is happening in the range and Ev4 (which is starting in the rage)
L < R: Range too small for a complete event
|---Ev1---| |---Ev3---| |---Ev5---|
|---Ev2---| |---Ev4---|
| |
L R
Here I would like to get Ev3 as it is currently running in the range and Ev4 as it is starting in the range
L == R: If I want to know what happens at one point in time
|---Ev1---| |---Ev3---| |---Ev5---|
|---Ev2---| |---Ev4---|
|
LR
Here I would like only Ev2 as it is the only one currently running.

As you need to handle three cases - starting, happening or ending in the given range, we can split it into three parts.
starting: v1 lies in [L,R].
ending: v2 lies in [L,R].
The third case can be formulated as v1 <= R and L <= v2, but the first two cases partially cover this case, so we will use different formulation to avoid collisions:
happening: v1 < L and R < v2
Well, it is easy to handle the first case in logarithmic plus number of reported events time if we can sort the array of events by v1. The same trick works for the second case.
The third case is trickier. Let's draw:
The pink area represents all intervals L <= R. The red dot is an interval and greenish area represents all possible events we want to capture. To do such a capture one can use k2-tree.

Using an indexed approach is fine - such as Boost.ICL solution.
That being said you could easily use a std::vector for this - even unsorted - I think for as long as you are within the range of some 100.000 or even 1.000.000 you should be fine (as long as you store actual values - not pointers in the vector as that can be slow) - exact numbers will of course depend on your thressholds.
struct MyEvent {
double v1;//you use double for time
double v2;
};
std::vector<MyEvent> events;
Here is an example using 1.000.000 elements:
http://coliru.stacked-crooked.com/a/9a6d90348f6915e1
and the searching runs in 42 ms which consists of one compare and optional copy while your case may be a bit different it is comparable.
Going further, you could get more power by parallelizing your search in some way using eg. std::for_each.

std::map -->finding element complexity is O(logn)
If your keys are unique and you don't have a memory problem, you can use std::unordered_map which complexity is amortized (O1).
Also, you don't need to create 2 maps.
std::unordered_map<double, std::pair<Event*, Event*>> StartEndPoints;.
If your keys don't unique you can use std::unordered_multimap, but if your keys will be repeated a lot, the finding complexity could become (On).
I'll suggest not to pass key type as a double.
std::hash<double> hashing.
auto temp = hashing(key). // decltype of temp will be size_t
std::unordered_map<std::size_t, std::pair<Event*, Event*>> StartEndPoints;

Related

Faster way of searching array of sets

I have an array containing 100,000 sets. Each set contains natural numbers below 1,000,000. I have to find the number of ordered pairs {m, n}, where 0 < m < 1,000,000, 0 < n < 1,000,000 and m != n, which do not exist together in any of 100,000 sets. A naive method of searching through all the sets leads to 10^5 * (10^6 choose 2) number of searches.
For example I have 2 sets set1 = {1,2,4} set2 = {1,3}. All possible ordered pairs of numbers below 5 are {1,2}, {1,3}, {1,4}, {2,3}, {2,4} and {3,4}. The ordered pairs of numbers below 5 which do not exist together in set 1 are {1,3},{2,3} and {3,4}. The ordered pairs below 5 missing in set 2 are {1,2},{1,4},{2,3},{2,4} and {3,4}. The ordered pairs which do not exist together in both the sets are {2,3} and {3,4}. So the count of number of ordered pairs missing is 2.
Can anybody point me to a clever way of organizing my data structure so that finding the number of missing pairs is faster? I apologize in advance if this question has been asked before.
Update:
Here is some information about the structure of my data set.
The number of elements in each set varies from 2 to 500,000. The median number of elements is around 10,000. The distribution peaks around 10,000 and tapers down in both direction. The union of the elements in the 100,000 sets is close to 1,000,000.
If you are looking for combinations across sets, there is a way to meaningfully condense your dataset, as shown in frenzykryger's answer. However, from your examples, what you're looking for is the number of combinations available within each set, meaning each set contains irreducible information. Additionally, you can't use combinatorics to simply obtain the number of combinations from each set either; you ultimately want to deduplicate combinations across all sets, so the actual combinations matter.
Knowing all this, it is difficult to think of any major breakthroughs you could make. Lets say you have i sets and a maximum of k items in each set. The naive approach would be:
If your sets are typically dense (i.e. contain most of the numbers between 1 and 1,000,000), replace them with the complement of the set instead
Create a set of 2 tuples (use a set structure that ensures insertion is idempotent)
For each set O(i):
Evaluate all combinations and insert into set of combinations: O(k choose 2)
The worst case complexity for this isn't great, but assuming you have scenarios where a set either contains most of the numbers between 0 and 1,000,000, or almost none of them, you should see a big improvement in performance.
Another approach would be to go ahead and use combinatorics to count the number of combinations from each set, then use some efficient approach to find the number of duplicate combinations among sets. I'm not aware of such an approach, but it is possible it exists.
First lets solve more simple task of counting number of elements not present in your sets. This task can be reworded in more simple form - instead of 100,000 sets you can think about 1 set which contains all your numbers. Then number of elements not present in this set is x = 1000000 - len(set). Now you can use this number x to count number of combinations. With repetitions: x * x, without repetitions: x * (x - 1). So bottom line of my answer is to put all your numbers in one big set and use it's length to find number of combinations using combinatorics.
Update
So above we have a way to find number of combinations where each element in combination is not in any of the sets. But question was to find number of combinations where each combination is not present in any of the sets.
Lets try to solve simpler problem first:
your sets have all numbers in them, none missing
each number is present exactly in one set, no duplicates across sets
How you would construct such combinations over such sets? You would simply pick two elements from different sets and resulting combination would not be in any of the sets. Number of such combinations could be counted using following code (it accepts sizes of the sets):
int count_combinations(vector<int>& buckets) {
int result = 0;
for (int i=0; i < buckets.size(); ++i) {
for (int j=i+1; j < buckets.size(); ++j) {
result += buckets[i] * buckets[j];
}
}
return result;
}
Now let's imagine that some numbers are missing. Then we can just add additional set with those missing numbers to our sets (as a separate set). But we also need to account that given there were n missing numbers there would be n * (n-1) combinations constructed using only these missing numbers. So following code will produce total number of combinations with account to missing numbers:
int missing_numbers = upper_bound - all_numbers.size() - 1;
int missing_combinations = missing_numbers * (missing_numbers - 1);
return missing_combinations + count_combinations(sets, missing_numbers);
Now lets imagine we have a duplicate across two sets: {a, b, c}, {a, d}.
What types of errors they will introduce? Following pairs: {a, a} - repetition, {a, d} - combination which is present in second set.
So how to treat such duplicates? We need to eliminate them completely from all sets. Even single instance of a duplicate will produce combination present in some set. Because we can just pick any element from the set where duplicate was removed and produce such combination (in my example - if we will keep a in first set, then pick d from the second to produce {a, d}, if we will keep a in second set, then pick b or c from the first to produce {a, b} and {a, c}). So duplicates shall be removed.
Update
However we can't simply remove all duplicates, consider this counterexample:
{a, b} {a, c} {d}. If we simply remove a we will acquire {b} {c} {d} and lost information about not-existing combination {a, d}. Consider another counterexample:
{a, b} {a, b, c} {b, d}. If we simply remove duplicates we will acquire {c} {d} and lost information about {a, d}.
Also we can't simply apply such logic to pairs of sets, a simple counter example for numbers < 3: {1, 2} {1} {2}. Here number of missing combinations is 0, but we will incorrectly count in {1, 2} if we will apply duplicates removal to pair of sets. Bottom line is that I can't come up with good technique which will help to correctly handle duplicate elements across sets.
What you can do, depending on memory requirements, is take advantage of the ordering of Set, and iterate over the values smartly. Something like the code below (untested). You'll iterate over all of your sets, and then for each of your sets you'll iterate over their values. For each of these values, you'll check all of the values in the set after them. Our complexity is reduced to the number of sets times the square of their sizes. You can use a variety of methods to keep track of your found/unfound count, but using a set should be fine, since insertion is simply O(log(n)) where n is no more than 499999500000. In theory using a map of sets (mapping based on the first value) could be slightly faster, but in either case the cost is minimal.
long long numMissing(const std::array<std::set<int>, 100000>& sets){
std::set<pair<int, int> > found;
for (const auto& s : sets){
for (const auto& m : s){
const auto &n = m;
for (n++; n != s.cend(); n++){
found.emplace(m, n);
}
}
}
return 499999500000 - found.size();
}
As an option you can build Bloom Filter(s) over your sets.
Before checking against all sets you can quickly lookup at your bloom filter and since it will never produce false negatives you can safely use your pair as its not present in your sets.
Physically storing each possible pair would take too much memory. We have 100k sets and an average set has 10k numbers = 50M pairs = 400MB with int32 (and set<pair<int, int>> needs much more than 8 bytes per element).
My suggestion is based on two ideas:
don't store, only count the missing pairs
use interval set for compact storage and fast set operations (like boost interval set)
The algorithm is still quadratic on the number of elements in the sets but needs much less space.
Algorithm:
Create the union_set of the individual sets.
We also need a data structure, let's call it sets_for_number to answer this question: which sets contain a particular number? For the simplest case this could be unordered_map<int, vector<int>> (vector stores set indices 0..99999)
Also create the inverse sets for each set. Using interval sets this takes only 10k * 2 * sizeof(int) space per set on average.
dynamic_bitset<> union_set = ...; //union of individual sets (can be vector<bool>)
vector<interval_set<int>> inverse_sets = ...; // numbers 1..999999 not contained in each set
int64_t missing_count = 0;
for(int n = 1; n < 1000000; ++n)
// count the missing pairs whose first element is n
if (union_set.count(n) == 0) {
// all pairs are missing
missing_count += (999999 - n);
} else {
// check which second elements are not present
interval_set<int> missing_second_elements = interval_set<int>(n+1, 1000000);
// iterate over all sets containing n
for(int set_idx: sets_for_number.find(n)) {
// operator&= is in-place intersection
missing_second_elements &= inverse_sets[set_idx];
}
// counting the number of pairs (n, m) where m is a number
// that is not present in any of the sets containing n
for(auto interval: missing_second_elements)
missing_count += interval.size()
}
}
If it is possible, have a set of all numbers and remove each of the number when you insert to your array of set. This will have a O(n) space complexity.
Of course if you don't want to have high spec complexity, maybe you can have a range vector. For each element in the vector, you have a pair of numbers which are the start/end of a range.

Is there a better way to implement the 2-SUM algorithm?

Currently, I was trying to create a 2-SUM algorithm that would, given a set of around 1 million integers, find the number of target values t (-10,000 <= t <= 10,000) that are formed by the sum of any two values x,y in the set.
I have no problem with 2-SUM for a single value of t, just by using hash-tables and finding for each hash entry x in the table if there exists another entry t-x. This will run in O(N) time.
But, now I have to find multiple values of t, from -10000 to 10000. If I just use a plain for-loop, then the runtime will now be O(N^2).
I have tried this code, which brute-forces through all t from -10000 to 10000, but it simply runs too slow (~1hr. to execute).
So, my question is, are there any hints for better ways to handle the ~20,001 targets without having to brute-force through all 20,001 values?
Here is the code I used for my O(N^2) solution:
for(long long t = -10000; t <= 10000; t++)
{
for(unordered_set<long long>::iterator it=S.begin(); it != S.end(); ++it)
{
long long value = *it;
if((S.find(t-value) != S.end()) & (t-value != value))
{
values++;
//cout << "Found pair target " << t << " " << value << " " << t-value << '\n';
break;
}
}
}
A better approach would be to use an ordered set (if values are unique, or ordered array / list if you care for duplicates).
Then, you search for a matching pair for your values using the following method:
For each Val (-10000, -9999, ...)
Let iS be 0
Let iE be length - 1
While (S[iS] + S[iE]) != Val
4.1 (S[iS] + S[iE]) > Val : Binary Search in (iS -> iE - 1) for the maximum value, lower or equal to (Val - S[iS]) and set iE to match.
4.2 (S[iS] + S[iE]) < Val : Binary Search in (iS +1 -> iE) for the minimum value, higher or equal to (Val - S[iE]) and set iS to match.
4.3 If iS > iE, Val doesn't exist.
This gives you O(n log(n)) for sorting, and O(m n) (m is 20001 for -10000 -> 10000) for searching although realistically, the searching will perform much better then O(m n). The entire solution is O(m n) due to m > log(n).
It can be further optimized by using a map of matched values and on each iteration, after a match is found, advance iE till (S[iS] + S[iE]) > maxValue (10000) and marking all sums as found, then there are less iterations in outer loop.
As other people have already suggested, if you want a "best effort" approach (meaning that it may not be the best, but still good enough), you can sort your data and use std::lower_bound for searching.
The std::lower_bound function is implemented as a binary search, which means that in the worst case, for 1000000 integers you'll be having 20 comparisons to find a match. If you do this inside of a -10000..10000 loop you'll get 20000*20 = 400000 comparisons, which should take far less than an hour (my guess is a few minutes, depending on CPU power).
The map::find on an unordered_set is a linear search, that means that in the worst case you're going to have 20000*1000000 = 20000000000 comparisons, which is 50000 times worse.
You could improve on a binary search (e.g. by seeing how "close" you're to your target and switching to linear search from there if you're under a specific difference in value) but I don't think that would speed up the search that much.
There are other ways, probaly faster (maybe you could discard duplicates using 15625 integers with 64 bit precision and setting the bit matching the value in your dataset, giving you and O(n) time for the setup and an O(1) for the lookup, but you're going to need two sets, one for positive values, the other for negative), but they're going to be much more difficult to implement.
Thanks to everyone who has helped!
I solved the problem by partitioning the input into multiple "buckets", that is, I would sort the dataset and then split it into buckets of intervals of 10,000. So, the smallest 10k numbers go into 1st bucket, next 10k to 2nd, and so forth.... I would split it into so when I have to search for the entry t-x, I will search in my 10,000 numbers rather than all 1,000,000 numbers.

Generating random integers with a difference constraint

I have the following problem:
Generate M uniformly random integers from the range 0-N, where N >> M, and where no pair has a difference less than K. where M >> K.
At the moment the best method I can think of is to maintain a sorted list, then determine the lower bound of the current generated integer and test it with the lower and upper elements, if it's ok to then insert the element in between. This is of complexity O(nlogn).
Would there happen to be a more efficient algorithm?
An example of the problem:
Generate 1000 uniformly random integers between zero and 100million where the difference between any two integers is no less than 1000
A comprehensive way to solve this would be to:
Determine all the combinations of n-choose-m that satisfy the constraint, lets called it set X
Select a uniformly random integer i in the range [0,|X|).
Select the i'th combination from X as the result.
This solution is problematic when the n-choose-m is large, as enumerating and storing all possible combinations will be extremely costly. Hence an efficient online generating solution is sought.
Note: The following is a C++ implementation of the solution provided by pentadecagon
std::vector<int> generate_random(const int n, const int m, const int k)
{
if ((n < m) || (m < k))
return std::vector<int>();
std::random_device source;
std::mt19937 generator(source());
std::uniform_int_distribution<> distribution(0, n - (m - 1) * k);
std::vector<int> result_list;
result_list.reserve(m);
for (int i = 0; i < m; ++i)
{
result_list.push_back(distribution(generator));
}
std::sort(std::begin(result_list),std::end(result_list));
for (int i = 0; i < m; ++i)
{
result_list[i] += (i * k);
}
return result_list;
}
http://ideone.com/KOeR4R
.
EDIT: I adapted the text for the requirement to create ordered sequences, each with the same probability.
Create random numbers a_i for i=0..M-1 without duplicates. Sort them. Then create numbers
b_i=a_i + i*(K-1)
Given the construction, those numbers b_i have the required gaps, because the a_i already have gaps of at least 1. In order to make sure those b values cover exactly the required range [1..N], you must ensure a_i are picked from a range [1..N-(M-1)*(K-1)]. This way you get truly independent numbers. Well, as independent as possible given the required gap. Because of the sorting you get O(M log M) performance again, but this shouldn't be too bad. Sorting is typically very fast. In Python it looks like this:
import random
def random_list( N, M, K ):
s = set()
while len(s) < M:
s.add( random.randint( 1, N-(M-1)*(K-1) ) )
res = sorted( s )
for i in range(M):
res[i] += i * (K-1)
return res
First off: this will be an attempt to show that there's a bijection between the (M+1)- compositions (with the slight modification that we will allow addends to be 0) of the value N - (M-1)*K and the valid solutions to your problem. After that, we only have to pick one of those compositions uniformly at random and apply the bijection.
Bijection:
Let
Then the xi form an M+1-composition (with 0 addends allowed) of the value on the left (notice that the xi do not have to be monotonically increasing!).
From this we get a valid solution
by setting the values mi as follows:
We see that the distance between mi and mi + 1 is at least K, and mM is at most N (compare the choice of the composition we started out with). This means that every (M+1)-composition that fulfills the conditions above defines exactly one valid solution to your problem. (You'll notice that we only use the xM as a way to make the sum turn out right, we don't use it for the construction of the mi.)
To see that this gives a bijection, we need to see that the construction can be reversed; for this purpose, let
be a given solution fulfilling your conditions. To get the composition this is constructed from, define the xi as follows:
Now first, all xi are at least 0, so that's alright. To see that they form a valid composition (again, every xi is allowed to be 0) of the value given above, consider:
The third equality follows since we have this telescoping sum that cancels out almost all mi.
So we've seen that the described construction gives a bijection between the described compositions of N - (M-1)*K and the valid solutions to your problem. All we have to do now is pick one of those compositions uniformly at random and apply the construction to get a solution.
Picking a composition uniformly at random
Each of the described compositions can be uniquely identified in the following way (compare this for illustration): reserve N - (M-1)*K spaces for the unary notation of that value, and another M spaces for M commas. We get an (M+1)- composition of N - (M-1)*K by choosing M of the N - (M-1)*K + M spaces, putting commas there, and filling the rest with |. Then let x0 be the number of | before the first comma, xM+1 the number of | after the last comma, and all other xi the number of | between commas i and i+1. So all we have to do is pick an M-element subset of the integer interval[1; N - (M-1)*K + M] uniformly at random, which we can do for example with the Fisher-Yates shuffle in O(N + M log M) (we need to sort the M delimiters to build the composition) since M*K needs to be in O(N) for any solutions to exist. So if N is bigger than M by at least a logarithmic factor, then this is linear in N.
Note: #DavidEisenstat suggested that there are more space efficient ways of picking the M-element subset of that interval; I'm not aware of any, I'm afraid.
You can get an error-proof algorithm out of this by doing the simple input validation we get from the construction above that N ≥ (M-1) * K and that all three values are at least 1 (or 0, if you define the empty set as a valid solution for that case).
Why not do this:
for (int i = 0; i < M; ++i) {
pick a random number between K and N/M
add this number to (N/M)* i;
Now you have M random numbers, distributed evenly along N, all of which have a difference of at least K. It's in O(n) time. As an added bonus, it's already sorted. :-)
EDIT:
Actually, the "pick a random number" part shouldn't be between K and N/M, but between min(K, [K - (N/M * i - previous value)]). That would ensure that the differences are still at least K, and not exclude values that should not be missed.
Second EDIT:
Well, the first case shouldn't be between K and N/M - it should be between 0 and N/M. Just like you need special casing for when you get close to the N/M*i border, we need special initial casing.
Aside from that, the issue you brought up in your comments was fair representation, and you're right. As my pseudocode is presented, it currently completely misses the excess between N/M*M and N. It's another edge case; simply change the random values of your last range.
Now, in this case, your distribution will be different for the last range. Since you have more numbers, you have slightly less chance for each number than you do for all the other ranges. My understanding is that because you're using ">>", this shouldn't really impact the distribution, i.e. the difference in size in the sample set should be nominal. But if you want to make it more fair, you divide the excess equally among each range. This makes your initial range calculation more complex - you'll have to augment each range based on how much remainder there is divided by M.
There are lots of special cases to look out for, but they're all able to be handled. I kept the pseudocode very basic just to make sure that the general concept came through clearly. If nothing else, it should be a good starting point.
Third and Final EDIT:
For those worried that the distribution has a forced evenness, I still claim that there's nothing saying it can't. The selection is uniformly distributed in each segment. There is a linear way to keep it uneven, but that also has a trade-off: if one value is selected extremely high (which should be unlikely given a very large N), then all the other values are constrained:
int prevValue = 0;
int maxRange;
for (int i = 0; i < M; ++i) {
maxRange = N - (((M - 1) - i) * K) - prevValue;
int nextValue = random(0, maxRange);
prevValue += nextValue;
store previous value;
prevValue += K;
}
This is still linear and random and allows unevenness, but the bigger prevValue gets, the more constrained the other numbers become. Personally, I prefer my second edit answer, but this is an available option that given a large enough N is likely to satisfy all the posted requirements.
Come to think of it, here's one other idea. It requires a lot more data maintenance, but is still O(M) and is probably the most fair distribution:
What you need to do is maintain a vector of your valid data ranges and a vector of probability scales. A valid data range is just the list of high-low values where K is still valid. The idea is you first use the scaled probability to pick a random data range, then you randomly pick a value within that range. You remove the old valid data range and replace it with 0, 1 or 2 new data ranges in the same position, depending on how many are still valid. All of these actions are constant time other than handling the weighted probability, which is O(M), done in a loop M times, so the total should be O(M^2), which should be much better than O(NlogN) because N >> M.
Rather than pseudocode, let me work an example using OP's original example:
0th iteration: valid data ranges are from [0...100Mill], and the weight for this range is 1.0.
1st iteration: Randomly pick one element in the one element vector, then randomly pick one element in that range.
If the element is, e.g. 12345678, then we remove the [0...100Mill] and replace it with [0...12344678] and [12346678...100Mill]
If the element is, e.g. 500, then we remove the [0...100Mill] and replace it with just [1500...100Mill], since [0...500] is no longer a valid range. The only time we will replace it with 0 ranges is in the unlikely event that you have a range with only one number in it and it gets picked. (In that case, you'll have 3 numbers in a row that are exactly K apart from each other.)
The weight for the ranges are their length over the total length, e.g. 12344678/(12344678 + (100Mill - 12346678)) and (100Mill - 12346678)/(12344678 + (100Mill - 12346678))
In the next iterations, you do the same thing: randomly pick a number between 0 and 1 and determine which of the ranges that scale falls into. Then randomly pick a number in that range, and replace your ranges and scales.
By the time it's done, we're no longer acting in O(M), but we're still only dependent on the time of M instead of N. And this actually is both uniform and fair distribution.
Hope one of these ideas works for you!

What are practical uses for STL's 'partial_sum'?

What/where are the practical uses of the partial_sum algorithm in STL?
What are some other interesting/non-trivial examples or use-cases?
I used it to reduce memory usage of a simple mark-sweep garbage collector in my toy lambda calculus interpreter.
The GC pool is an array of objects of identical size. The goal is to eliminate objects that aren't linked to other objects, and condense the remaining objects into the beginning of the array. Since the objects are moved in memory, each link needs to be updated. This necessitates an object remapping table.
partial_sum allows the table to be stored in compressed format (as little as one bit per object) until the sweep is complete and memory has been freed. Since the objects are small, this significantly reduces memory use.
Recursively mark used objects and populate the Boolean array.
Use remove_if to condense the marked objects to the beginning of the pool.
Use partial_sum over the Boolean values to generate a table of pointers/indexes into the new pool.
This works because the Nth marked object has N preceding 1's in the array and acquires pool index N.
Sweep over the pool again and replace each link using the remap table.
It's especially friendly to the data cache to put the remap table in the just-freed, thus still hot, memory.
One thing to note about partial sum is that it is the operation that undoes adjacent difference much like - undoes +. Or better yet if you remember calculus the way differentiation undoes integration. Better because adjacent difference is essentially differentiation and partial sum is integration.
Let's say you have simulation of a car and at each time step you need to know the position, velocity, and acceleration. You only need to store one of those values as you can compute the other two. Say you store the position at each time step you can take the adjacent difference of the position to give the velocity and the adjacent difference of the velocity to give the acceleration. Alternatively, if you store the acceleration you can take the partial sum to give the velocity and the partial sum of the velocity gives the position.
Partial sum is one of those functions that doesn't come up too often for most people but is enormously useful when you find the right situation. A lot like calculus.
Last time I (would have) used it is when converting a discrete probability distribution (an array of p(X = k)) into a cumulative distribution (an array of p(X <= k)). To select once from the distribution, you can pick a number from [0-1) randomly, then binary search into the cumulative distribution.
That code wasn't in C++, though, so I did the partial sum myself.
You can use it to generate a monotonically increasing sequence of numbers. For example, the following generates a vector containing the numbers 1 through 42:
std::vector<int> v(42, 1);
std::partial_sum(v.begin(), v.end(), v.begin());
Is this an everyday use case? Probably not, though I've found it useful on several occasions.
You can also use std::partial_sum to generate a list of factorials. (This is even less useful, though, since the number of factorials that can be represented by a typical integer data type is quite limited. It is fun, though :-D)
std::vector<int> v(10, 1);
std::partial_sum(v.begin(), v.end(), v.begin());
std::partial_sum(v.begin(), v.end(), v.begin(), std::multiplies<int>());
Personal Use Case: Roulette-Wheel-Selection
I'm using partial_sum in a roulette-wheel-selection algorithm (link text). This algorithm choses randomly elements from a container with a probability which is linear to some value given beforehands.
Because all my elements to choose from bringing a not-necessarily normalized value, I use the partial_sum algorithm for constructing something like a "roulette-wheel", because I sum up all the elements. Then I chose a random variable in this range (the last partial_sum is the sum of all) and use stl::lower_bound for searching "the wheel" where my random search landed. The element returned by the lower_bound algorithm is the chosen one.
Besides the advantage of clear and expressive code with the use of partial_sum, I could also gain some speed when experimenting with the GCC parallel mode which brings parallelized versions for some algorithms and one of them is the partial_sum (link text).
Another use I know of: One of the most important algorithmic primitives in parallel processing (but maybe a little bit away from STL)
If you're interested in heavy optimized algorithms which are using partial_sum (in this case maybe more results under the synonyms "scan" or "prefix_sum"), than go to the parallel algorithms community. They need it all the time. You won't find a parallel sorting algorithm based on quicksort or mergesort without using it. This operation is one of the most important parallel primitives used. I think it is most commonly used for calculating offsets in dynamic algorithms. Think of a partition step in quicksort, which is split and fed to the parallel threads. You don't know the number of elements in each slot of the partition before calculating it. So you need some offsets for all the threads for later access.
Maybe you will find more informatin in the now-hot topic of GPU processing. One short article regarding Nvidia's CUDA and the scan-primitive with a few application examples you will find in Chapter 39. Parallel Prefix Sum (Scan) with CUDA.
Personal Use Case: intermediate step in counting sort from CLRS:
COUNTING_SORT (A, B, k)
for i ← 1 to k do
c[i] ← 0
for j ← 1 to n do
c[A[j]] ← c[A[j]] + 1
//c[i] now contains the number of elements equal to i
// std::partial_sum here
for i ← 2 to k do
c[i] ← c[i] + c[i-1]
// c[i] now contains the number of elements ≤ i
for j ← n downto 1 do
B[c[A[i]]] ← A[j]
c[A[i]] ← c[A[j]] - 1
You could build a "moving sum" (precursor to a moving average):
template <class T>
void moving_sum (const vector<T>& in, int num, vector<T>& out)
{
// cummulative sum
partial_sum (in.begin(), in.end(), out.begin());
// shift and subtract
int j;
for (int i = out.size() - 1; i >= 0; i--) {
j = i - num;
if (j >= 0)
out[i] -= out[j];
}
}
And then call it with:
vector<double> v(10);
// fill in v
vector<double> v2 (v.size());
moving_sum (v, 3, v2);
You know, I actually did use partial_sum() once... It was this interesting little problem that I was asked on a job interview. I enjoyed it so much, I went home and coded it up.
The problem was: Given a sequential sequence of integers, find the shortest sub-sequence with the highest value. E.g. Given:
Value: -1 2 3 -1 4 -2 -4 5
Index: 0 1 2 3 4 5 6 7
We would find the subsequence [1,4]
Now the obvious solution is to run with 3 for loops, iterating over all possible starts & ends, and adding up the value of each possible subsequence in turn. Inefficient, but quick to code up and hard to make mistakes. (Especially when the third for loop is just accumulate(start,end,0).)
The correct solution involves a divide-and-conquer / bottom up approach. E.g. Divide the problem space in half, and for each half compute the largest subsequence contained within that section, the largest subsequence including the starting number, the largest subsequence including the ending number, and the entire section's subsequence. Armed with this data we can then combine the two halves together without any further evaluation of either one. Obviously the data for each half can be computed by further breaking each half into halves (quarters), each quarter into halves (eighths), and so on until we have trivial singleton cases. It's all quite efficient.
But all that aside, there's a third (somewhat less efficient) option that I wanted to explore. It's similar to the 3-for-loop case, only we add the adjacent numbers to avoid so much work. The idea is that there's no need to add a+b, a+b+c, and a+b+c+d when we can add t1=a+b, t2=t1+c, and t3=t2+d. It's a space/computation tradeoff thing. It works by transforming the sequence:
Index: 0 1 2 3 4
FROM: 1 2 3 4 5
TO: 1 3 6 10 15
Thereby giving us all possible substrings starting at index=0 and ending at indexes=0,1,2,3,4.
Then we iterate over this set subtracting the successive possible "start" points...
FROM: 1 3 6 10 15
TO: - 2 5 9 14
TO: - - 3 7 12
TO: - - - 4 9
TO: - - - - 5
Thereby giving us the values (sums) of all possible subsequences.
We can find the maximum value of each iteration via max_element().
The first step is most easily accomplished via partial_sum().
The remaining steps via a for loop and transform(data+i,data+size,data+i,bind2nd(minus<TYPE>(),data[i-1])).
Clearly O(N^2). But still interesting and fun...
Partial sums are often useful in parallel algorithms. Consider the code
for (int i=0; N>i; ++i) {
sum += x[i];
do_something(sum);
}
If you want to parallelise this code, you need to know the partial sums. I am using GNUs parallel version of partial_sum for something very similar.
I often use partial sum not to sum but to calculate the current value in the sequence depending on the previous.
For example, if you integrate a function. Each new step is a previous step, vt += dvdt or vt = integrate_step(dvdt, t_prev, t_prev+dt);.
In nonparametric Bayesian methods there is a Metropolis-Hastings step (per observation) that determines to sample a new or an existing cluster. If an existing cluster has to be sampled this needs to be done with different weights. These weighted likelihoods are simulated in the following example code.
#include <random>
#include <iostream>
#include <algorithm>
int main() {
std::default_random_engine generator(std::random_device{}());
std::uniform_real_distribution<double> distribution(0.0,1.0);
int K = 8;
std::vector<double> weighted_likelihood(K);
for (int i = 0; i < K; ++i) {
weighted_likelihood[i] = i*10;
}
std::cout << "Weighted likelihood: ";
for (auto i: weighted_likelihood) std::cout << i << ' ';
std::cout << std::endl;
std::vector<double> cumsum_likelihood(K);
std::partial_sum(weighted_likelihood.begin(), weighted_likelihood.end(), cumsum_likelihood.begin());
std::cout << "Cumulative sum of weighted likelihood: ";
for (auto i: cumsum_likelihood) std::cout << i << ' ';
std::cout << std::endl;
std::vector<int> frequency(K);
int N = 280000;
for (int i = 0; i < N; ++i) {
double pick = distribution(generator) * cumsum_likelihood.back();
auto lower = std::lower_bound(cumsum_likelihood.begin(), cumsum_likelihood.end(), pick);
int index = std::distance(cumsum_likelihood.begin(), lower);
frequency[index]++;
}
std::cout << "Frequencies: ";
for (auto i: frequency) std::cout << i << ' ';
std::cout << std::endl;
}
Note that this is not different from the answer by https://stackoverflow.com/users/13005/steve-jessop. It's added to give a bit more context about a particular situation (nonparametric Bayesian mehods, e.g. the algorithms by Neal using the Dirichlet process as prior) and the actual code which uses partial_sum in combination with lower_bound.

O(log n) algorithm to find the element having rank i in union of pre-sorted lists

Given two sorted lists, each containing n real numbers, is there a O(log n) time algorithm to compute the element of rank i (where i coresponds to index in increasing order) in the union of the two lists, assuming the elements of the two lists are distinct?
EDIT:
#BEN: This i s what I have been doing , but I am still not getting it.
I have an examples ;
List A : 1, 3, 5, 7
List B : 2, 4, 6, 8
Find rank(i) = 4.
First Step : i/2 = 2;
List A now contains is A: 1, 3
List B now contains is B: 2, 4
compare A[i] to B[i] i.e
A[i] is less;
So the lists now become :
A: 3
B: 2,4
Second Step:
i/2 = 1
List A now contains A:3
List B now contains B:2
NoW I HAVE LOST THE VALUE 4 which is actually the result ...
I know I am missing some thing , but even after close to a day of thinking I cant just figure this one out...
Yes:
You know the element lies within either index [0,i] of the first list or [0,i] of the second list. Take element i/2 from each list and compare. Proceed by bisection.
I'm not including any code because this problem sounds a lot like homework.
EDIT: Bisection is the method behind binary search. It works like this:
Assume i = 10; (zero-based indexing, we're looking for the 11th element overall).
On the first step, you know the answer is either in list1(0...10) or list2(0...10). Take a = list1(5) and b = list2(5).
If a > b, then there are 5 elements in list1 which come before a, and at least 6 elements in list2 which come before a. So a is an upper bound on the result. Likewise there are 5 elements in list2 which come before b and less than 6 elements in list1 which come before b. So b is an lower bound on the result. Now we know that the result is either in list1(0..5) or list2(5..10). If a < b, then the result is either in list1(5..10) or list2(0..5). And if a == b we have our answer (but the problem said the elements were distinct, therefore a != b).
We just repeat this process, cutting the size of the search space in half at each step. Bisection refers to the fact that we choose the middle element (bisector) out of the range we know includes the result.
So the only difference between this and binary search is that in binary search we compare to a value we're looking for, but here we compare to a value from the other list.
NOTE: this is actually O(log i) which is better (at least no worse than) than O(log n). Furthermore, for small i (perhaps i < 100), it would actually be fewer operations to merge the first i elements (linear search instead of bisection) because that is so much simpler. When you add in cache behavior and data locality, the linear search may well be faster for i up to several thousand.
Also, if i > n then rely on the fact that the result has to be toward the end of either list, your initial candidate range in each list is from ((i-n)..n)
Here is how you do it.
Let the first list be ListX and the second list be ListY. We need to find the right combination of ListX[x] and ListY[y] where x + y = i. Since x, y, i are natural numbers we can immediately constrain our problem domain to x*y. And by using the equations max(x) = len(ListX) and max(y) = len(ListY) we now have a subset of x*y elements in the form [x, y] that we need to search.
What you will do is order those elements like so [i - max(y), max(y)], [i - max(y) + 1, max(y) - 1], ... , [max(x), i - max(x)]. You will then bisect this list by choosing the middle [x, y] combination. Since the lists are ordered and distinct you can test ListX[x] < ListY[y]. If true then we bisect the upper half our [x, y] combinations or if false then we bisect the lower half. You will keep bisecting until find the right combination.
There are a lot of details I left, but that is the general gist of it. It is indeed O(log(n))!
Edit: As Ben pointed out this actually O(log(i)). If we let n = len(ListX) + len(ListY) then we know that i <= n.
When merging two lists, you're going to have to touch every element in both lists. If you don't touch every element, some elements will be left behind. Thus your theoretical lower bound is O(n). So you can't do it that way.
You don't have to sort, since you have two lists that are already sorted, and you can maintain that ordering as part of the merge.
edit: oops, I misread the question. I thought given value, you want to find rank, not the other way around. If you want to find rank given value, then this is how to do it in O(log N):
Yes, you can do this in O(log N), if the list allows O(1) random access (i.e. it's an array and not a linked list).
Binary search on L1
Binary search on L2
Sum the indices
You'd have to work out the math, +1, -1, what to do if element isn't found, etc, but that's the idea.