C++ Priority Queue - ordering intervals - c++

Given initially a time interval. Every time, we pick the largest interval and split that into two halves. If there is a tie, then we pick the interval with the lowest starting point.
for example - [0,9]
first split - P1 [0,4] and P2 [4,9]
For second split :
dist(P1) = 3 => if pick P1, new intervals would be [0,2] and [2,4].
dist(P2) = 4 => if pick P2, new intervals are [4, 6] and [6,9]
In both the cases, we have to create sub interval of distance 1. So, it's a tie. and, we pick P1 as P1 < P2.
[0,2], [2, 4], [4, 9]
Third Split:
[0,2], [2, 4], [4,6], [6,9]
Fourth Split:
There is a tie, s0, picked [0,2]
[0,1], [1,2], [2,4], [4,6], [6, 9]
Fifth Split:
[0,1], [1,2], [2,3], [3,4], [4,6], [6,9]
a possible candidate to be on the top : [4,6]
But, I always get [1,2] on top.
#include <iostream>
#include <queue>
using namespace std;
int main()
{
auto dist{ [](const auto & p) {
return p.second - p.first - 1;
} };
auto comp{
[&dist](const auto & p1, const auto & p2) {
if (abs(dist(p1) - dist(p2)) <= 1) {
return p1.first > p2.first;
}
return dist(p1) < dist(p2);
}
};
priority_queue<pair<int, int>, vector<pair<int, int>>, decltype(comp)> maxQ{comp};
maxQ.push({ 0, 9 }); // initial interval
for (int i{ 0 }; i < 5; ++i) {
auto ii{ maxQ.top() };
maxQ.pop();
int mid = (ii.first + ii.second) / 2;
maxQ.push({ ii.first, mid });
maxQ.push({ mid, ii.second });
}
while (!maxQ.empty()) {
auto& ii{ maxQ.top() };
cout << ii.first << " : " << ii.second << endl;
maxQ.pop();
}
}
I'm getting the following output :
1 : 2
6 : 9
0 : 1
2 : 3
3 : 4
4 : 6
IMO, 1 : 2 interval shouldn't be on top. Could someone help me here, why is it so.

It turns out that this issue has more to do with how priority queues comparators are designed, refer to The reason of using `std::greater` for creating min heap via `priority_queue`
The gist of it is that when two nodes are compared, if the comparator returns true, p1 will fall below p2. So a basic ">" comparator, will have smaller nodes at the top, and bigger nodes at the bottom.
To visualize the issue, I ran through it in a debugger, this is the moment at which the (1,2) is being put above (6,9). This is the current state of the priority queue:
2 : 4
6 : 9
4 : 6
0 : 1
1 : 2
We see that (2,4) is in front (6,9), which is expected since our comparison function says that (2,4) < (6,9) as explained above.
Then, the code goes to pop the top of the priority queue, meaning replaces (2,4) with the new biggest interval. How priority queues in C++ do this, is they swap the first element and last elements of the heap, and then reduce the size of it by 1 (so we lose the original first element).
So after the swap and size reduction, our heap looks like this:
1 : 2
6 : 9
4 : 6
0 : 1
Then, since the previously deemed smallest element is now at the top of the queue, we need to find its rightful spot.
So (1,2) is going to look at its children, (6,9) and (4,6), and see which is more important.
With our comparison operator, (4,6) is the more important node.
It then compares, (1,2) with the most important of the previous two nodes, (4,6), to see if it needs to perform a swap to make the queue valid.
It then finds that (1,2) is more important because 1 < 4. Thus, (1,2) stays in its spot and we're left with:
1 : 2
6 : 9
4 : 6
0 : 1

We can plainly see that [1,2] is ordered before [4,6], by plugging it into your comparator:
comp([1,2], [4,6])
if (abs(dist([1,2]) - dist([4,6])) <= 1) { // abs(0 - 1) <= 1 or 1 <= 1
return 1 > 4; // false
}
return dist([1,2]) < dist([4,6]); // not reached
Only you can correct the comparator to achieve whatever your goal is here, but the existing code is wrong if you want [1,2] to be ordered after [4,6].
At a guess, though, based on your description, you might try:
if (abs(dist(p1)) == abs(dist(p2)))
But I'd go to some lengths to ensure that your ordering is strict weak as it must be for the container. Sprinkling some more abs around may help.
Overall this is quite a complex comparator that's not easy to understand at a glance.

I think It is because the ordering of intervals is not strict.
e.g. P1(0,1), P2(4,6) and P3(6,9)
P1 should come before P2.
P2 should come before P3.
P3 should come before P1.
That's crazy. How could I set a strict pair ordering here?

Related

Sorting pairs of elements from vectors to maximize a function

I am working on a vector sorting algorithm for my personal particle physics studies but I am very new to coding.
Going through individual scenarios (specific vector sizes and combinations) by brute force becomes extremely chaotic for greater numbers of net vector elements, especially since this whole code will be looped up to 1e5 times.
Take four vectors of 'flavors' A and B: A+, A-, B+, and B-. I need to find two total pairs of elements such that some value k(V+, V-) is maximized with the restriction that different flavors cannot be combined! (V is just a flavor placeholder)
For example:
A+ = {a1+}
A- = {a1-}
B+ = {b1+, b2+}
B- = {b1-}
Since A+ and A- only have one element each, the value k(A+, A-) -> k(a1+, a1-). But for flavor B, there are two possible combinations.
k(b1+, b1-) OR k(b2+, b1-)
I would like to ensure that the combination of elements with the greater value of k is retained. As I said previously, this specific example is not TOO bad by brute force, but say B+ and B- had two elements each? The possible values would be:
k(b1+, b1-) or k(b2+,b2-) or k(b1+, b2-) or k(b2+, b1-)
where only one of these is correct. Furthermore, say two of those four B+B- combinations had greater k than that of A+A-. This would also be valid!
Any help would be appreciated!!! I can clarify if anything above is overly confusing!
I tried something like this,
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
static bool sortbypair(const pair<double, double> &a, const pair<double, double> &b)
{
return (k(a.first, a.second) > k(b.first, b.second)) && k(a.first, b.second) < k(a.second, b.first);
}
But I can't flesh it out.
If I understand your question correctly,
you have a function k which maps two doubles (or a std::pair<double, double>) to a single double. I am assuming double, it wasn't clear from the question. It also isn't strictly at the core of your problem.
you have four std::vector<double>s: aplus, aminus, bplus and bminus.
Your domain are all std::pair<double, double>s that you can form by combining the elements in aplus and aminus as well as all combinations from bplus and bminus respectively.
you want to either
find the pair in your domain that maximizes k
get a collection of all pairs in your domain, sorted by the value of k
Did I get this right? You state in your question
I need to find two total pairs of elements such that some value k(V+, V-) is maximized [...]
which confuses me a bit.
My suggestion is to break down your problem into three subtasks:
Create a range of all combinations of elements in the vectors Vplus and Vminus. This is often denoted as a cartesian product Vplus x Vminus.
Concatenate the ranges created in step 1 for aplus x aminus and bplus x bminus to get a range of all viable pairs in your domain.
Maximize/sort the range from step 2.
Implementation using range-v3
The range-v3 library provides some very convenient tools for this kind of task. Let's assume your k function looks like this:
double k(double x, double y) { return x*x + y*y; }
and your vectors look like this:
std::vector<double> ap{0., 4., 2., 3., 1.};
std::vector<double> am{2., -1.};
std::vector<double> bp{1., 0.5};
std::vector<double> bm{-1., 2.};
Let's define a range representing our domain:
using namespace ranges;
auto pairs_view = view::concat(
view::cartesian_product(ap, am),
view::cartesian_product(bp, bm)
);
The pairs_view instance doesn't actually create the pairs anywhere in memory. It is just an adaptor object that let's you iterate over all pairs that you can construct in the specified way. The pairs are created "lazily" on the fly as you - or an algorithm - iterates over it.
Let's print all pairs from our domain:
auto print = [](auto const& p){
auto first = std::get<0>(p);
auto second = std::get<1>(p);
std::cout << "[" << first << ", " << second << "] k = " << k(first, second) << std::endl;
};
for_each(pairs_view, print);
Output:
[0, 2] k = 4
[0, -1] k = 1
[4, 2] k = 20
[4, -1] k = 17
[2, 2] k = 8
[2, -1] k = 5
[3, 2] k = 13
[3, -1] k = 10
[1, 2] k = 5
[1, -1] k = 2
[1, -1] k = 2
[1, 2] k = 5
[0.5, -1] k = 1.25
[0.5, 2] k = 4.25
Finding the maximum element
Let's start by defining a convenience function (here, in the form of a lambda expression) that evaluates k for a tuple of doubles:
auto k_proj = [](auto const& p){
return k(std::get<0>(p), std::get<1>(p));
};
You can find an iterator to the pair in your domain that maximizes k with just the single line:
auto it = max_element(pairs_view, less{}, k_proj);
print(*it);
Output:
[4, 2] k = 20
The function max_element gets two additional arguments. The first is a comparison function that returns true, if two elements are in order. We provide the default less functor. The second argument is an optional projection that is to be applied on each element before the comparison. We pass k_proj.
Read the above line of code as "Find the element in pairs_view of which the projection onto its k value is maximal, where we want to compare the projected values with the standard less function."
Getting a sorted range of your domain
If you want to have all sorted range of all pairs in your domain, we must create an std::vector<std::pair<double, double>> for your domain first and then sort it. You cannot sort views created with the range-v3 library, because they are just a view into existing objects, they cannot be mutated. In addition, we have to map the special pair types created by the range-v3 library in the cartesian_product functions to actual std::pair<double, double to copy the values into our new container:
auto to_std_pair = [](auto const& p){
return std::pair<double, double>{std::get<0>(p), std::get<1>(p)};
};
auto pairs_vec = pairs_view | view::transform(to_std_pair) | to_vector;
Note that the "pipe" operator | is short-hand notation for the function composition to_vector(view::transform(pairs_view, to_std_pair)).
The invokation of the sorting algorithm looks very similar to the invokation of the max_element algorithm:
sort(pairs_vec, less{}, k_proj);
Let's print the result:
for_each(pairs_vec, print);
Output:
[0, -1] k = 1
[0.5, -1] k = 1.25
[1, -1] k = 2
[1, -1] k = 2
[0, 2] k = 4
[0.5, 2] k = 4.25
[2, -1] k = 5
[1, 2] k = 5
[1, 2] k = 5
[2, 2] k = 8
[3, -1] k = 10
[3, 2] k = 13
[4, -1] k = 17
[4, 2] k = 20
Here is a complete live code example: https://godbolt.org/z/6zo8oj3ah
If you don't want to use the range-v3 library you have two options:
You can wait. Large parts of the range-v3 library have been added to the standard library in C++20. The relevant functions concat, cartesian_product and to_vector will presumably be added in the upcoming standard C++23.
The standard library has max_element and sort. So you could just implement the concatenation and cartesian product on your own: https://godbolt.org/z/7Y5dG16WK
Thank you to everyone who commented!!! I really appreciate your effort. The solution ended up being much simpler than I was making it out to be.
Essentially, from the physics program I'm using, the particles are given in a listed form (ie. 533 e-, 534
p+, 535 e+, etc.). I couldn't figure out how to get range-v3 working (or any external libraries for that matter but thank you for the suggestion) so I figured out to make a tuple out of the indices of combined particles and their associated k value.
#include <iostream>
#include <vector>
#include <algorithm>
#include <tuple>
using namespace std;
static bool weirdsort(const tuple<int, int, double> &a, const tuple<int, int, double> &b)
{
return get<2>(a) > get<2>(b);
}
int main()
{
vector<tuple<int, int, double>> net;
// Sample ptcl list
//
// A+ A- B+ B-
// 0 a1+
// 1 a1-
// 2 b1-
// 3 b1+
// 4 a2+
// 5 a2-
for(int i = 0; i < A+.size(); i++)
{
for (int j = 0; j < A-.size(); j++)
{
net.push_back(A+[i], A-[j], k(A+[i], A-[j]));
}
}
sort(net.begin(), net.end(), weirdsort);
//Now another for loop that erases a tuple (with a lower k value) if it has a repeated ptcl index.
for (int i = 0; i < net.size(); i++)
{
if (get<0>(net[i]) == get<0>(net[i + 1]) || get<1>(net[i]) == get<1>(net[i + 1]))
{
net.erase(net.begin() + i + 1);
}
}
//Now can plot third tuple element of net[0] and net[1]
return 0;
}
It's not quite perfect but since I'm only looking for the first two highest k values it works out just fine. Thanks again!

Adjusting index in a vector in C++

I am writing a code in which the values of vector indexes are displayed as per the sorted order of the elements they hold:
For example:
values -> 3 2 4 1 5
indices -> 1 2 3 4 5 //yeah I know C++ indexing starts with 0. Well, while printing, I will add 1
result -> 4 2 1 3 5 //Bear with me. I know its confusing. I will clarify below
Now, the result has been obtained by sorting the elements and printing their earlier indices.
Like:
values(sorted) -> 1 2 3 4 5
indices(before sorting) -> 4 2 1 3 5
Now, there are many ways to do it with some suggesting to store the previous values and search and print the previous index, while others suggesting to create a new vector and copy the previous indices in it and then sorting them and.... (Well I didn't read further, 'cause that's definitely not how I was gonna proceed)
I tried a different approach while trying to not use a second vector.
So here's the approach:
while (!vec_students.empty()) {
std::vector<int>::iterator iterator = std::min_element(vec_students.begin(), vec_students.end());
std::cout << std::distance(vec_students.begin(), iterator) + 1 << " ";
vec_students.erase(iterator);
}
Now in this approach, the problem I am facing is that once I use erase, the index of all elements decreases by a certain incrementing value. So this was the solution I thought of:
while (!vec_students.empty()) {
static int i = 0; //yeah I know standard static variables are initialised to 1.
std::vector<int>::iterator iterator = std::min_element(vec_students.begin(), vec_students.end());
std::cout << std::distance(vec_students.begin(), iterator) + i << " ";
vec_students.erase(iterator);
i++;
}
Now the thought goes like this:
Initial Solution:
vector: 2 3 1
expected output: 3 1 2 (For explanation refer above)
first index = indexof(min(2,3,1)) -> 2 (while printing add 1) -> 3
second index = indexof(min(2,3)) -> 0 (while printing....) -> 1
third index = indexof(min(3)) -> 0 (while...) -> 1
Then I realized that the vector size decreases which means, indices will decrease (by 1 in this case)
so I added the extra i thingy.
Solution working:
vector: 2 3 1 i = 0
first index = indexof(min(2,3,1)) -> 3 -> add i -> 3 -> increment i -> i = 1
second index = indexof(min(2,3)) -> 0 -> add i -> 1 -> increment i -> i = 2
third index = indexof(min(3)) -> 0 -> add i -> 2 -> increment i -> i = 3
and the program ends.
But in the above case, instead of 3 1 2 I am getting 3 2 3 (first value right, rest incremented by 1)
What is wrong with my logic?
the index of all elements decreases by a certain incrementing value.
Not all, just the ones that come after the one you removed. Here's one way to do it without making another vector:
#include <algorithm>
#include <iostream>
#include <limits>
#include <vector>
int main() {
std::vector<int> v{3, 2, 4, 1, 5};
auto const* beg = v.data();
auto sz = v.size();
while (sz--) {
auto const min = std::min_element(v.begin(), v.end());
std::cout << &*min - beg << ' ';
*min = std::numeric_limits<int>::max();
}
}
This won't work properly if you have INT_MAX in your vector. In any case, making a second vector could yield better solutions. An example:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
int main() {
std::vector<int> v{3, 2, 4, 1, 5};
std::vector<int const*> addresses;
addresses.reserve(v.size());
std::transform(v.cbegin(), v.cend(), std::back_inserter(addresses),
[](auto const& elm) { return &elm; });
std::sort(addresses.begin(), addresses.end(),
[](int const* const ptr1, int const* const ptr2) {
return *ptr1 < *ptr2;
});
for (auto* p : addresses) {
std::cout << p - v.data() << ' ';
}
}
You think you should add indices because the vector is shrink.
Well, not really, only those after the removed element should be effected.
example.
[2,1,3] => [2,3]
index of 2 remains at 0,
while index of 3 becomes 1 from 2

Is there any number repeated in the array?

There's array of size n. The values can be between 0 and (n-1) as the indices.
For example: array[4] = {0, 2, 1, 3}
I should say if there's any number that is repeated more than 1 time.
For example: array[5] = {3,4,1,2,4} -> return true because 4 is repeated.
This question has so many different solutions and I would like to know if this specific solution is alright (if yes, please prove, else refute).
My solution (let's look at the next example):
array: indices 0 1 2 3 4
values 3 4 1 2 0
So I suggest:
count the sum of the indices (4x5 / 2 = 10) and check that the values' sum (3+4+1+2+0) is equal to this sum. if not, there's repeated number.
in addition to the first condition, get the multiplication of the indices(except 0. so: 1x2x3x4) and check if it's equal to the values' multiplication (except 0, so: 3x4x1x2x0).
=> if in each condition, it's equal then I say that there is NO repeated number. otherwise, there IS a repeated number.
Is it correct? if yes, please prove it or show me a link. else, please refute it.
Why your algorithm is wrong?
Your solution is wrong, here is a counter example (there may be simpler ones, but I found this one quite quickly):
int arr[13] = {1, 1, 2, 3, 4, 10, 6, 7, 8, 9, 10, 11, 6};
The sum is 78, and the product is 479001600, if you take the normal array of size 13:
int arr[13] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
It also has a sum of 78 and a product of 479001600 so your algorithm does not work.
How to find counter examples?1
To find a counter example2 3:
Take an array from 0 to N - 1;
Pick two even numbers3 M1 > 2 and M2 > 2 between 0 and N - 1 and halve them;
Replace P1 = M1/2 - 1 by 2 * P1 and P2 = M2/2 + 1 by 2 * P2.
In the original array you have:
Product = M1 * P1 * M2 * P2
Sum = 0 + M1 + P1 + M2 + P2
= M1 + M1/2 - 1 + M2 + M2/2 + 1
= 3/2 * (M1 + M2)
In the new array you have:
Product = M1/2 * 2 * P1 + M2/2 * 2 * P2
= M1 * P1 * M2 * P2
Sum = M1/2 + 2P1 + M2/2 + 2P2
= M1/2 + 2(M1/2 - 1) + M2/2 + 2(M2/2 + 1)
= 3/2 * M1 - 2 + 3/2 * M2 + 2
= 3/2 * (M1 + M2)
So both array have the same sum and product, but one has repeated values, so your algorithm does not work.
1 This is one method of finding counter examples, there may be others (there are probably others).
2 This is not exactly the same method I used to find the first counter example - In the original method, I used only one number M and was using the fact that you can replace 0 by 1 without changing the product, but I propose a more general method here in order to avoid argument such as "But I can add a check for 0 in my algorithm.".
3 That method does not work with small array because you need to find 2 even numbers M1 > 2 and M2 > 2 such that M1/2 != M2 (and reciprocally) and M1/2 - 1 != M2/2 + 1, which (I think) is not possible for any array with a size lower than 14.
What algorithms do work?4
Algorithm 1: O(n) time and space complexity.
If you can allocate a new array of size N, then:
template <std::size_t N>
bool has_repetition (std::array<int, N> const& array) {
std::array<bool, N> rep = {0};
for (auto v: array) {
if (rep[v]) {
return true;
}
rep[v] = true;
}
return false;
}
Algorithm 2: O(nlog(n)) time complexity and O(1) space complexity, with a mutable array.
You can simply sort the array:
template <std::size_t N>
bool has_repetition (std::array<int, N> &array) {
std::sort(std::begin(array), std::end(array));
auto it = std::begin(array);
auto ne = std::next(it);
while (ne != std::end(array)) {
if (*ne == *it) {
return true;
}
++it; ++ne;
}
return false;
}
Algorithm 3: O(n^2) time complexity and O(1) space complexity, with non mutable array.
template <std::size_t N>
bool has_repetition (std::array<int, N> const& array) {
for (auto it = std::begin(array); it != std::end(array); ++it) {
for (auto jt = std::next(it); jt != std::end(array); ++jt) {
if (*it == *jt) {
return true;
}
}
}
return false;
}
4 These algorithms do work, but there may exist other ones that performs better - These are only the simplest ones I could think of given some "restrictions".
What's wrong with your method?
Your method computes some statistics of the data and compares them with those expected for a permutation (= correct answers). While a violation of any of these comparisons is conclusive (the data cannot satisfy the constraint), the inverse is not necessarily the case. You only look at two statistics, and these are too few for sufficiently large data sets. Owing to the fact that the data are integer, the smallest number of data for which your method may fail is larger than 3.
If you are searching duplicates in your array there is simple way:
int N =5;
int array[N] = {1,2,3,4,4};
for (int i = 0; i< N; i++){
for (int j =i+1; j<N; j++){
if(array[j]==array[i]){
std::cout<<"DUPLICATE FOUND\n";
return true;
}
}
}
return false;
Other simple way to find duplicates is using the std::set container for example:
std::set<int> set_int;
set_int.insert(5);
set_int.insert(5);
set_int.insert(4);
set_int.insert(4);
set_int.insert(5);
std::cout<<"\nsize "<<set_int.size();
the output will be 2, because there is 2 individual values
A more in depth explanation why your algorithm is wrong:
count the sum of the indices (4x5 / 2 = 10) and check that the values' sum (3+4+1+2+0) is equal to this sum. if not, there's repeated number.
Given any array A which has no duplicates, it is easy to create an array that meets your first requirement but now contains duplicates. Just take take two values and subtract one of them by some value v and add that value to the other one. Or take multiple values and make sure the sum of them stays the same. (As long as new values are still within the 0 .. N-1 range.) For N = 3 it is already possible to change {0,1,2} to {1,1,1}. For an array of size 3, there are 7 compositions that have correct sum, but 1 is a false positive. For an array of size 4 there are 20 out of 44 have duplicates, for an array of size 5 that's 261 out of 381, for an array of size 6 that's 3612 out of 4332, and so on. It is save to say that the number of false positives grows much faster than real positives.
in addition to the first condition, get the multiplication of the indices(except 0. so: 1x2x3x4) and check if it's equal to the values' multiplication (except 0, so: 3x4x1x2x0).
The second requirement involves the multiplication of all indices above 0. It is easy to realize this is could never be a very strong restriction either. As soon as one of the indices is not prime, the product of all indices is no longer uniquely tied to the multiplicands and a list can be constructed of different values with the same result. E.g. a pair of 2 and 6 can be replaced with 3 and 4, 2 and 9 can be replaced with 6 and 3 and so on. Obviously the number of false positives increases as the array-size gets larger and more non-prime values are used as multiplicands.
None of these requirements is really strong and the cannot compensate for the other. Since 0 is not even considered for the second restriction a false positive can be created fairly easy for arrays starting at size 5. any pair of 0 and 4 can simply be replaced with two 2's in any unique array, for example {2, 1, 2, 3, 2}
What you would need, is to have a result that is uniquely tight to the occurring values. You could tweak your second requirement to a more complex approach and skip over the non-prime values and take 0 into account. For example you could use the first prime as multiplicand (2) for 0, use 3 as multiplicand for 1, 5 as multiplicand for 2, and so on. That would work (you would not need the first requirement), but this approach would be overly complex. An simpler way to get a unique result would be to OR the i-th bit for each value (0 => 1 << 0, 1 => 1 << 1, 2 => 1 << 2, and so on. (Obviously it is faster to check wether a bit was already set by a reoccurring value, rather than wait for the final result. And this is conceptually the same as using a bool array/vector from the other examples!)

STL approaches for given loop

this loop works fine as expected. however is there any STL approach to mimic the exact functionality as the example below?
for (auto i = vec.size() - 1; i > 0; --i)
{
vec[i] = vec[i - 1];
}
Rather than an insertion or a rotate, all we're doing here is copying, so it seems like the thing to use is a copy. We could do the job with reverse_iterators:
std::copy(f.rbegin() + 1, f.rend(), f.rbegin());
...or with the algorithm really intended specifically for this sort of situation, std::copy_backward:
std::copy_backward(f.begin(), f.end()-1, f.end());
Either way, it's simple, straightforward, and about as efficient as possible (almost certainly more efficient than using insert/pop or rotate/assign).
std::rotate:
template< class ForwardIt >
ForwardIt rotate( ForwardIt first, ForwardIt n_first, ForwardIt last );
Used as (for a vector v)
// rotation left
std::rotate(v.begin(), v.begin() + 1, v.end());
// example:
// initial v: 1 2 3 4 5
// after rotate: : 2 3 4 5 1
// rotation right (as in your "script")
std::rotate(v.rbegin(), v.rbegin() + 1, v.rend());
// example:
// initial v: 1 2 3 4 5
// after rotate: 5 1 2 3 4
//now if you do this,then it'll have the same effect as your code.
v[0] = v[1];
//before assignment: 5 1 2 3 4
//after assignment: 1 1 2 3 4
The difference w.r.t. your example is that, here, the first element will receive the previously last element (whereas in your code, the first element is untouched).
Performs a left rotation on a range of elements.
Specifically, std::rotate swaps the elements in the range [first,
last) in such a way that the element n_first becomes the first element
of the new range and n_first - 1 becomes the last element.
A precondition of this function is that [first, n_first) and [n_first,
last) are valid ranges.
http://en.cppreference.com/w/cpp/algorithm/rotate
The exact equivalent? Assuming the vector is not empty:
auto val = vec.front(); //Just in case the list is 1 element long.
vec.pop_back();
vec.insert(vec.begin(), val);
Your code effectively does this:
1 2 3 4 5 6
1 1 2 3 4 5
The first element is in two places, while the last element is lost. The above code does the same.

Majority element - parts of an array

I have an array, filled with integers. My job is to find majority element quickly for any part of an array, and I need to do it... log n time, not linear, but beforehand I can take some time to prepare the array.
For example:
1 5 2 7 7 7 8 4 6
And queries:
[4, 7] returns 7
[4, 8] returns 7
[1, 2] returns 0 (no majority element), and so on...
I need to have an answer for each query, if possible, it needs to execute fast.
For preparation, I can use O(n log n) time
O(log n) queries and O(n log n) preprocessing/space could be achieved by finding and using majority intervals with following properties:
For each value from input array there may be one or several majority intervals (or there may be none if elements with these values are too sparse; we don't need majority intervals of length 1 because they may be useful only for query intervals of size 1 which are better handled as a special case).
If query interval lies completely inside one of these majority intervals, corresponding value may be the majority element of this query interval.
If there is no majority interval completely containing query interval, corresponding value cannot be the majority element of this query interval.
Each element of input array is covered by O(log n) majority intervals.
In other words, the only purpose of majority intervals is to provide O(log n) majority element candidates for any query interval.
This algorithm uses following data structures:
List of positions for each value from input array (map<Value, vector<Position>>). Alternatively unordered_map may be used here to improve performance (but we'll need to extract all keys and sort them so that structure #3 is filled in proper order).
List of majority intervals for each value (vector<Interval>).
Data structure for handling queries (vector<small_map<Value, Data>>). Where Data contains two indexes of appropriate vector from structure #1 pointing to next/previous positions of elements with given value. Update: Thanks to #justhalf, it is better to store in Data cumulative frequencies of elements with given value. small_map may be implemented as sorted vector of pairs - preprocessing will append elements already in sorted order and query will use small_map only for linear search.
Preprocessing:
Scan input array and push current position to appropriate vector in structure #1.
Perform steps 3 .. 4 for every vector in structure #1.
Transform list of positions into list of majority intervals. See details below.
For each index of input array covered by one of majority intervals, insert data to appropriate element of structure #3: value and positions of previous/next elements with this value (or cumulative frequency of this value).
Query:
If query interval length is 1, return corresponding element of source array.
For starting point of query interval get corresponding element of 3rd structure's vector. For each element of the map perform step 3. Scan all elements of the map corresponding to ending point of query interval in parallel with this map to allow O(1) complexity for step 3 (instead of O(log log n)).
If the map corresponding to ending point of query interval contains matching value, compute s3[stop][value].prev - s3[start][value].next + 1. If it is greater than half of the query interval, return value. If cumulative frequencies are used instead of next/previous indexes, compute s3[stop+1][value].freq - s3[start][value].freq instead.
If nothing found on step 3, return "Nothing".
Main part of the algorithm is getting majority intervals from list of positions:
Assign weight to each position in the list: number_of_matching_values_to_the_left - number_of_nonmatching_values_to_the_left.
Filter only weights in strictly decreasing order (greedily) to the "prefix" array: for (auto x: positions) if (x < prefix.back()) prefix.push_back(x);.
Filter only weights in strictly increasing order (greedily, backwards) to the "suffix" array: reverse(positions); for (auto x: positions) if (x > suffix.back()) suffix.push_back(x);.
Scan "prefix" and "suffix" arrays together and find intervals from every "prefix" element to corresponding place in "suffix" array and from every "suffix" element to corresponding place in "prefix" array. (If all "suffix" elements' weights are less than given "prefix" element or their position is not to the right of it, no interval generated; if there is no "suffix" element with exactly the weight of given "prefix" element, get nearest "suffix" element with larger weight and extend interval with this weight difference to the right).
Merge overlapping intervals.
Properties 1 .. 3 for majority intervals are guaranteed by this algorithm. As for property #4, the only way I could imagine to cover some element with maximum number of majority intervals is like this: 11111111222233455666677777777. Here element 4 is covered by 2 * log n intervals, so this property seems to be satisfied. See more formal proof of this property at the end of this post.
Example:
For input array "0 1 2 0 0 1 1 0" the following lists of positions would be generated:
value positions
0 0 3 4 7
1 1 5 6
2 2
Positions for value 0 will get the following properties:
weights: 0:1 3:0 4:1 7:0
prefix: 0:1 3:0 (strictly decreasing)
suffix: 4:1 7:0 (strictly increasing when scanning backwards)
intervals: 0->4 3->7 4->0 7->3
merged intervals: 0-7
Positions for value 1 will get the following properties:
weights: 1:0 5:-2 6:-1
prefix: 1:0 5:-2
suffix: 1:0 6:-1
intervals: 1->none 5->6+1 6->5-1 1->none
merged intervals: 4-7
Query data structure:
positions value next prev
0 0 0 x
1..2 0 1 0
3 0 1 1
4 0 2 2
4 1 1 x
5 0 3 2
...
Query [0,4]:
prev[4][0]-next[0][0]+1=2-0+1=3
query size=5
3>2.5, returned result 0
Query [2,5]:
prev[5][0]-next[2][0]+1=2-1+1=2
query size=4
2=2, returned result "none"
Note that there is no attempt to inspect element "1" because its majority interval does not include either of these intervals.
Proof of property #4:
Majority intervals are constructed in such a way that strictly more than 1/3 of all their elements have corresponding value. This ratio is nearest to 1/3 for sub-arrays like any*(m-1) value*m any*m, for example, 01234444456789.
To make this proof more obvious, we could represent each interval as a point in 2D: every possible starting point represented by horizontal axis and every possible ending point represented by vertical axis (see diagram below).
All valid intervals are located on or above diagonal. White rectangle represents all intervals covering some array element (represented as unit-size interval on its lower right corner).
Let's cover this white rectangle with squares of size 1, 2, 4, 8, 16, ... sharing the same lower right corner. This divides white area into O(log n) areas similar to yellow one (and single square of size 1 containing single interval of size 1 which is ignored by this algorithm).
Let's count how many majority intervals may be placed into yellow area. One interval (located at the nearest to diagonal corner) occupies 1/4 of elements belonging to interval at the farthest from diagonal corner (and this largest interval contains all elements belonging to any interval in yellow area). This means that smallest interval contains strictly more than 1/12 values available for whole yellow area. So if we try to place 12 intervals to yellow area, we have not enough elements for different values. So yellow area cannot contain more than 11 majority intervals. And white rectangle cannot contain more than 11 * log n majority intervals. Proof completed.
11 * log n is overestimation. As I said earlier, it's hard to imagine more than 2 * log n majority intervals covering some element. And even this value is much greater than average number of covering majority intervals.
C++11 implementation. See it either at ideone or here:
#include <iostream>
#include <vector>
#include <map>
#include <algorithm>
#include <functional>
#include <random>
constexpr int SrcSize = 1000000;
constexpr int NQueries = 100000;
using src_vec_t = std::vector<int>;
using index_vec_t = std::vector<int>;
using weight_vec_t = std::vector<int>;
using pair_vec_t = std::vector<std::pair<int, int>>;
using index_map_t = std::map<int, index_vec_t>;
using interval_t = std::pair<int, int>;
using interval_vec_t = std::vector<interval_t>;
using small_map_t = std::vector<std::pair<int, int>>;
using query_vec_t = std::vector<small_map_t>;
constexpr int None = -1;
constexpr int Junk = -2;
src_vec_t generate_e()
{ // good query length = 3
src_vec_t src;
std::random_device rd;
std::default_random_engine eng{rd()};
auto exp = std::bind(std::exponential_distribution<>{0.4}, eng);
for (int i = 0; i < SrcSize; ++i)
{
int x = exp();
src.push_back(x);
//std::cout << x << ' ';
}
return src;
}
src_vec_t generate_ep()
{ // good query length = 500
src_vec_t src;
std::random_device rd;
std::default_random_engine eng{rd()};
auto exp = std::bind(std::exponential_distribution<>{0.4}, eng);
auto poisson = std::bind(std::poisson_distribution<int>{100}, eng);
while (int(src.size()) < SrcSize)
{
int x = exp();
int n = poisson();
for (int i = 0; i < n; ++i)
{
src.push_back(x);
//std::cout << x << ' ';
}
}
return src;
}
src_vec_t generate()
{
//return generate_e();
return generate_ep();
}
int trivial(const src_vec_t& src, interval_t qi)
{
int count = 0;
int majorityElement = 0; // will be assigned before use for valid args
for (int i = qi.first; i <= qi.second; ++i)
{
if (count == 0)
majorityElement = src[i];
if (src[i] == majorityElement)
++count;
else
--count;
}
count = 0;
for (int i = qi.first; i <= qi.second; ++i)
{
if (src[i] == majorityElement)
count++;
}
if (2 * count > qi.second + 1 - qi.first)
return majorityElement;
else
return None;
}
index_map_t sort_ind(const src_vec_t& src)
{
int ind = 0;
index_map_t im;
for (auto x: src)
im[x].push_back(ind++);
return im;
}
weight_vec_t get_weights(const index_vec_t& indexes)
{
weight_vec_t weights;
for (int i = 0; i != int(indexes.size()); ++i)
weights.push_back(2 * i - indexes[i]);
return weights;
}
pair_vec_t get_prefix(const index_vec_t& indexes, const weight_vec_t& weights)
{
pair_vec_t prefix;
for (int i = 0; i != int(indexes.size()); ++i)
if (prefix.empty() || weights[i] < prefix.back().second)
prefix.emplace_back(indexes[i], weights[i]);
return prefix;
}
pair_vec_t get_suffix(const index_vec_t& indexes, const weight_vec_t& weights)
{
pair_vec_t suffix;
for (int i = indexes.size() - 1; i >= 0; --i)
if (suffix.empty() || weights[i] > suffix.back().second)
suffix.emplace_back(indexes[i], weights[i]);
std::reverse(suffix.begin(), suffix.end());
return suffix;
}
interval_vec_t get_intervals(const pair_vec_t& prefix, const pair_vec_t& suffix)
{
interval_vec_t intervals;
int prev_suffix_index = 0; // will be assigned before use for correct args
int prev_suffix_weight = 0; // same assumptions
for (int ind_pref = 0, ind_suff = 0; ind_pref != int(prefix.size());)
{
auto i_pref = prefix[ind_pref].first;
auto w_pref = prefix[ind_pref].second;
if (ind_suff != int(suffix.size()))
{
auto i_suff = suffix[ind_suff].first;
auto w_suff = suffix[ind_suff].second;
if (w_pref <= w_suff)
{
auto beg = std::max(0, i_pref + w_pref - w_suff);
if (i_pref < i_suff)
intervals.emplace_back(beg, i_suff + 1);
if (w_pref == w_suff)
++ind_pref;
++ind_suff;
prev_suffix_index = i_suff;
prev_suffix_weight = w_suff;
continue;
}
}
// ind_suff out of bounds or w_pref > w_suff:
auto end = prev_suffix_index + prev_suffix_weight - w_pref + 1;
// end may be out-of-bounds; that's OK if overflow is not possible
intervals.emplace_back(i_pref, end);
++ind_pref;
}
return intervals;
}
interval_vec_t merge(const interval_vec_t& from)
{
using endpoints_t = std::vector<std::pair<int, bool>>;
endpoints_t ep(2 * from.size());
std::transform(from.begin(), from.end(), ep.begin(),
[](interval_t x){ return std::make_pair(x.first, true); });
std::transform(from.begin(), from.end(), ep.begin() + from.size(),
[](interval_t x){ return std::make_pair(x.second, false); });
std::sort(ep.begin(), ep.end());
interval_vec_t to;
int start; // will be assigned before use for correct args
int overlaps = 0;
for (auto& x: ep)
{
if (x.second) // begin
{
if (overlaps++ == 0)
start = x.first;
}
else // end
{
if (--overlaps == 0)
to.emplace_back(start, x.first);
}
}
return to;
}
interval_vec_t get_intervals(const index_vec_t& indexes)
{
auto weights = get_weights(indexes);
auto prefix = get_prefix(indexes, weights);
auto suffix = get_suffix(indexes, weights);
auto intervals = get_intervals(prefix, suffix);
return merge(intervals);
}
void update_qv(
query_vec_t& qv,
int value,
const interval_vec_t& intervals,
const index_vec_t& iv)
{
int iv_ind = 0;
int qv_ind = 0;
int accum = 0;
for (auto& interval: intervals)
{
int i_begin = interval.first;
int i_end = std::min<int>(interval.second, qv.size() - 1);
while (iv[iv_ind] < i_begin)
{
++accum;
++iv_ind;
}
qv_ind = std::max(qv_ind, i_begin);
while (qv_ind <= i_end)
{
qv[qv_ind].emplace_back(value, accum);
if (iv[iv_ind] == qv_ind)
{
++accum;
++iv_ind;
}
++qv_ind;
}
}
}
void print_preprocess_stat(const index_map_t& im, const query_vec_t& qv)
{
double sum_coverage = 0.;
int max_coverage = 0;
for (auto& x: qv)
{
sum_coverage += x.size();
max_coverage = std::max<int>(max_coverage, x.size());
}
std::cout << " size = " << qv.size() - 1 << '\n';
std::cout << " values = " << im.size() << '\n';
std::cout << " max coverage = " << max_coverage << '\n';
std::cout << " avg coverage = " << sum_coverage / qv.size() << '\n';
}
query_vec_t preprocess(const src_vec_t& src)
{
query_vec_t qv(src.size() + 1);
auto im = sort_ind(src);
for (auto& val: im)
{
auto intervals = get_intervals(val.second);
update_qv(qv, val.first, intervals, val.second);
}
print_preprocess_stat(im, qv);
return qv;
}
int do_query(const src_vec_t& src, const query_vec_t& qv, interval_t qi)
{
if (qi.first == qi.second)
return src[qi.first];
auto b = qv[qi.first].begin();
auto e = qv[qi.second + 1].begin();
while (b != qv[qi.first].end() && e != qv[qi.second + 1].end())
{
if (b->first < e->first)
{
++b;
}
else if (e->first < b->first)
{
++e;
}
else // if (e->first == b->first)
{
// hope this doesn't overflow
if (2 * (e->second - b->second) > qi.second + 1 - qi.first)
return b->first;
++b;
++e;
}
}
return None;
}
int main()
{
std::random_device rd;
std::default_random_engine eng{rd()};
auto poisson = std::bind(std::poisson_distribution<int>{500}, eng);
int majority = 0;
int nonzero = 0;
int failed = 0;
auto src = generate();
auto qv = preprocess(src);
for (int i = 0; i < NQueries; ++i)
{
int size = poisson();
auto ud = std::uniform_int_distribution<int>(0, src.size() - size - 1);
int start = ud(eng);
int stop = start + size;
auto res1 = do_query(src, qv, {start, stop});
auto res2 = trivial(src, {start, stop});
//std::cout << size << ": " << res1 << ' ' << res2 << '\n';
if (res2 != res1)
++failed;
if (res2 != None)
{
++majority;
if (res2 != 0)
++nonzero;
}
}
std::cout << "majority elements = " << 100. * majority / NQueries << "%\n";
std::cout << " nonzero elements = " << 100. * nonzero / NQueries << "%\n";
std::cout << " queries = " << NQueries << '\n';
std::cout << " failed = " << failed << '\n';
return 0;
}
Related work:
As pointed in other answer to this question, there is other work where this problem is already solved: "Range majority in constant time and linear space" by S. Durocher, M. He, I Munro, P.K. Nicholson, M. Skala.
Algorithm presented in this paper has better asymptotic complexities for query time: O(1) instead of O(log n) and for space: O(n) instead of O(n log n).
Better space complexity allows this algorithm to process larger data sets (comparing to the algorithm proposed in this answer). Less memory needed for preprocessed data and more regular data access pattern, most likely, allow this algorithm to preprocess data more quickly. But it is not so easy with query time...
Let's suppose we have input data most favorable to algorithm from the paper: n=1000000000 (it's hard to imagine a system with more than 10..30 gigabytes of memory, in year 2013).
Algorithm proposed in this answer needs to process up to 120 (or 2 query boundaries * 2 * log n) elements for each query. But it performs very simple operations, similar to linear search. And it sequentially accesses two contiguous memory areas, so it is cache-friendly.
Algorithm from the paper needs to perform up to 20 operations (or 2 query boundaries * 5 candidates * 2 wavelet tree levels) for each query. This is 6 times less. But each operation is more complex. Each query for succinct representation of bit counters itself contains a linear search (which means 20 linear searches instead of one). Worst of all, each such operation should access several independent memory areas (unless query size and therefore quadruple size is very small), so query is cache-unfriendly. Which means each query (while is a constant-time operation) is pretty slow, probably slower than in algorithm proposed here. If we decrease input array size, increased are the chances that proposed here algorithm is quicker.
Practical disadvantage of algorithm in the paper is wavelet tree and succinct bit counter implementation. Implementing them from scratch may be pretty time consuming. Using a pre-existing implementation is not always convenient.
the trick
When looking for a majority element, you may discard intervals that do not have a majority element. See Find the majority element in array. This allows you to solve this quite simply.
preparation
At preparation time, recursively keep dividing the array into two halves and store these array intervals in a binary tree. For each node, count the occurrence of each element in the array interval. You need a data structure that offers O(1) inserts and reads. I suggest using an unsorted_multiset, which on average behaves as needed (but worst case inserts are linear). Also check if the interval has a majority element and store it if it does.
runtime
At runtime, when asked to compute the majority element for a range, dive into the tree to compute the set of intervals that covers the given range exactly. Use the trick to combine these intervals.
If we have array interval 7 5 5 7 7 7, with majority element 7, we can split off and discard 5 5 7 7 since it has no majority element. Effectively the fives have gobbled up two of the sevens. What's left is an array 7 7, or 2x7. Call this number 2 the majority count of the majority element 7:
The majority count of a majority element of an array interval is the
occurrence count of the majority element minus the combined occurrence
of all other elements.
Use the following rules to combine intervals to find the potential majority element:
Discard the intervals that have no majority element
Combining two arrays with the same majority element is easy, just add up the element's majority counts. 2x7 and 3x7 become 5x7
When combining two arrays with different majority elements, the higher majority count wins. Subtract the lower majority count from the higher to find the resulting majority count. 3x7 and 2x3 become 1x7.
If their majority elements are different but have have equal majority counts, disregard both arrays. 3x7 and 3x5 cancel each other out.
When all intervals have been either discarded or combined, you are either left with nothing, in which case there is no majority element. Or you have one combined interval containing a potential majority element. Lookup and add this element's occurrence counts in all array intervals (also the previously discarded ones) to check if it really is the majority element.
example
For the array 1,1,1,2,2,3,3,2,2,2,3,2,2, you get the tree (majority count x majority element listed in brackets)
1,1,1,2,2,3,3,2,2,2,3,2,2
(1x2)
/ \
1,1,1,2,2,3,3 2,2,2,3,2,2
(4x2)
/ \ / \
1,1,1,2 2,3,3 2,2,2 3,2,2
(2x1) (1x3) (3x2) (1x2)
/ \ / \ / \ / \
1,1 1,2 2,3 3 2,2 2 3,2 2
(1x1) (1x3) (2x2) (1x2) (1x2)
/ \ / \ / \ / \ / \
1 1 1 2 2 3 2 2 3 2
(1x1) (1x1)(1x1)(1x2)(1x2)(1x3) (1x2)(1x2) (1x3) (1x2)
Range [5,10] (1-indexed) is covered by the set of intervals 2,3,3 (1x3), 2,2,2 (3x2). They have different majority elements. Subtract their majority counts, you're left with 2x2. So 2 is the potential majority element. Lookup and sum the actual occurrence counts of 2 in the arrays: 1+3 = 4 out of 6. 2 is the majority element.
Range [1,10] is covered by the set of intervals 1,1,1,2,2,3,3 (no majority element) and 2,2,2 (3x2). Disregard the first interval since it has no majority element, so 2 is the potential majority element. Sum the occurrence counts of 2 in all intervals: 2+3 = 5 out of 10. There is no majority element.
Actually, it can be done in constant time and linear space(!)
See https://cs.stackexchange.com/questions/16671/range-majority-queries-most-freqent-element-in-range and S. Durocher, M. He, I Munro, P.K. Nicholson, M. Skala, Range majority in constant time and linear space, Information and Computation 222 (2013) 169–179, Elsevier.
Their preparation time is O(n log n), the space needed is O(n) and queries are O(1). It is a theoretical paper and I don't claim to understand all of it but it seems far from impossible to implement. They're using wavelet trees.
For an implementation of wavelet trees, see https://github.com/fclaude/libcds
If you have unlimited memory you can and limited data range (like short int) do it even in O(N) time.
Go through array and count number of 1s, 2s, 3s, eta (number of entries for each value you have in array). You will need additional array X with sizeof(YouType) elements for this.
Go through array X and find maximum.
In total O(1) + O(N) operations.
Also you can limit yourself with O(N) memory, if you use map instead of array X.
But then you will need to find element on each iteration at stage 1. Therefore you will need O(N*log(N)) time in total.
You can use MAX Heap, with frequency of number as a deciding factor for Keeping Max Heap property,
I meant, e.g. for following input array
1 5 2 7 7 7 8 4 6 5
Heap would have all distinct elements with their frequency associated with them
Element = 1 Frequency = 1,
Element = 5 Frequency = 2,
Element = 2 Frequency = 1,
Element = 7 Frequency = 3,
Element = 8 Frequency = 1,
Element = 4 Frequency = 1,
Element = 6 Frequency = 1
As its MAX heap, Element 7 with frequency 3 would be at the root level,
Just check whether input range contains this element, if yes then this is the answer if no, then go to left subtree or right subtree as per input range and perform same checks.
O(N) would be required only once while creating a heap, but once its created, searching will be efficient.
Edit: Sorry, I was solving a different problem.
Sort the array and build an ordered list of pairs (value, number_of_occurrences) - it's O(N log N). Starting with
1 5 2 7 7 7 8 4 6
it will be
(1,1) (2,1) (4,1) (5,1) (6,1) (7,3) (8,1)
On top of this array, build a binary tree with pairs (best_value_or_none, max_occurrences). It will look like:
(1,1) (2,1) (4,1) (5,1) (6,1) (7,3) (8,1)
\ / \ / \ / |
(0,1) (0,1) (7,3) (8,1)
\ / \ /
(0,1) (7,3)
\ /
(7,3)
This structure definitely has a fancy name, but I don't remember it :)
From here, it's O(log N) to fetch the mode of any interval. Any interval can be split into O(log N) precomputed intervals; for example:
[4, 7] = [4, 5] + [6, 7]
f([4,5]) = (0,1)
f([6,7]) = (7,3)
and the result is (7,3).