C++ sort with 'tweaked' compare functor

C++ sort with 'tweaked' compare functor - c++

I have a class functor (too complex to implement as a lambda), but to strip the idea down, I want to ensure the functor satisfies the Compare predicate. The issue is, I want all values larger than (1) to yield ascending order, but to place all values of (1) at the 'end' - e.g., treated as 'larger' values.
e.g., {2, 2, 2, 3, 3, 3, 4, 5, 6, 6, ..., 1, 1, 1}
The function object is implemented as a struct to extract arguments from a (complicated) object reference it is constructed with, but the important part is the method in the function object. To simplify:
bool operator () (unsigned i, unsigned j)
{
if (i == 1) return false; // (1 >= x)
if (j == 1) return true; // (x <= 1)
return (i < j);
}
This appears to work as expected with std::sort and std::stable_sort. But, I'm still not convinced it correctly satisfies the criteria for Compare, in terms of strict weak ordering. Note that x <= 1 in all cases - that is, for: i, j >= 1. Clearly, (1, 1) => false
Is my 'tweaked' functor correct, even as it places values of (1) at the end? That is (1) has been handled to be interpreted as greater than values x > 1? Or have I just been lucky with my sort implementations?
As I should have clarified, the value (0) does not occur. I originally had this in a comment for the (very clever) accepted answer but mistakenly deleted it.

If you can define a bijective operation in which the comparison is total/weak order then you are fine.
It turns our that for your type (unsigned) this is simply -=2/+=2
bool operator()(unsigned i, unsigned j) const{
return (i-2) < (j-2); // usigned will wrap around 0
}
Well, that also depends what you want to do with zero.
This relies in 1 - 2 == std::numeric_limits<unsigned>::max() so when you "compare" e.g. 1 with x you get std::numeric_limits<unsigned>::max() < x - 2 which is false, even if x is also 1 (it will be true for 0 if there is such).

Related

Order an std::map by the absolute value of key

I want to order an std::map by the absolute value of key with following code as the usual customized order
vector<int> A = {-4,4,-2,2};
auto cmp = [](int a, int b) { return abs(a) < abs(b); };
map<int, int, decltype(cmp)> m(cmp);
for (int x : A) m[x]++;
But the result is m = {{-4,2},{-2,2}}. I don't know, why the key 4 and 2 are missing. I want to keep all values and actually I don't care about the = i.e. 4,-4 and -4,4 are both ok for me. I only what to order them when the values are different.

Your comparison function causes your map to treat keys 4 and -4 as equivalent. Ditto for keys 2 and -2, so that explains the results you get, since keys in a map must be unique.

As other answers mention, your comparison operator renders the keys like 4 and -4 equal, as far as the map is concerned.
Solution
You can create your own ordering (with a custom comparison operator) to achieve what you want:
auto cmp = [](int a, int b) {
auto aa = abs(a);
auto bb = abs(b);
return (2 * aa + (aa == a)) < (2 * bb + (bb == b));
};
outputs:
{-2, 1} {2, 1} {-4, 1} {4, 1}
Demo
Explanation
The numbers -4, -2, 2, 4 and so on are integers. Inside the comparison function, you can map them to to natural numbers and create a tie breaker for positive ones by:
taking the absolute of double the negative numbers
taking the absolute of double the positive + 1
So your keys K = {k1, k2, ... kn} behave as
K' = {k1', k2', ..., kn'} where
/ 2 * ki, ki < 0
ki' =
\ 2 * ki + 1, ki > 0

To order the int keys as 0, 1, -1, 2, -2, 3, -3, ... change the comparator to this (live):
auto cmp = [](int a, int b) { return abs(a) != abs(b) ? abs(a) < abs(b) : a > b; };
This comparator uses lexicographical comparison with two sort keys:
The absolute value of the number.
The sign of the number.

If you check the std::map documentation from cppreference.com:
Everywhere the standard library uses the Compare requirements, uniqueness is determined by using the equivalence relation. In imprecise terms, two objects a and b are considered equivalent (not unique) if neither compares less than the other: !comp(a, b) && !comp(b, a).
Because of that, after you've already inserted -4, it will work out that neither of 4 and -4 are less than the other, so deem them equal and increment that map value. If you'd inserted 4 first, you would have had a single entry for 4 instead of -4.
Personally, I consider this confusing and would recommend using a std::map<int, int> and for (auto a : A) m[abs(a)]++;.

Using comparator in upper_bound STL

I'm trying to make a program that gives the last element that is less than or equal to our given value.
According to the definition of lower_bound, it gives the first element that is greater than or equal to the given key value passed. I created a comparator function
bool compare(int a, int b) {
return a <= b;
}
This was passed in my lower bound function:
int largest_idx = lower_bound(ic, ic + n, m, compare)
On execution, it was giving me the last element which was less than equal to my m (key value). Isn't this opposite to how lower_bound works ? Lower bound is supposed to give me the first value for my comparison or does the comparator actually change that?

If you want to turn "first" into "last", you have two options. First, you can use std::upper_bound and then take the previous element (see below). Second, you can use reverse iterators:
const auto pos = std::lower_bound(
std::make_reverse_iterator(ic + n),
std::make_reverse_iterator(ic), m, compare);
where compare is
bool compare(int a, int b) {
return b < a;
}
With this comparator, std::lower_bound() returns the iterator pointing to the first element that is not greater (= less than or equal) than m. On the reversed range this is equivalent to returning the iterator pointing to the last element satisfying this criterion in the original range.
Simple example:
int ic[] = {1, 3, 3, 5};
// pos
// m = 1 ^
// m = 2 ^
// m = 3 ^
How do I modify that search criteria (change <= to something else)?
std::lower_bound finds the first element in the range (partitioned by the comparator into true, ..., true, false, ... false), for which the comparator returns false. If your criterion can be rephrased in this language, you can use std::lower_bound.
Suppose we have a range 1 3 3 5 and we replace < with <= (your version of compare). Then we have:
1 3 3 5
m = 2 T F F F
m = 3 T T T F
m = 4 T T T F
For m = 3 and m = 4, std::lower_bound will return the iterator to 5, i.e. past the last 3. In other words, std::lower_bound with default < being replaced with <= is exactly what std::upper_bound with default < is. You can advance the resulting iterator by -1 to get the last element (but be careful about corner cases like m = 0 in this example).
How do I change whether I want the first or last element
It always returns the first element for which the comparator returns false. You can either reverse the range or find the first element that follows the one you want to find.

The comparator must not check for equality, use less than.
Also the data shall already be sorted or must at least be partitioned according to the comparator.
cf. https://www.cplusplus.com/reference/algorithm/lower_bound/

How is the order of occurence maintained by the given conditions

I came accross this GeeksForGeeks which read : Rearrange positive and negative numbers using inbuilt sort function such that all negative integers appear before all the positive integers and the order of appearance should be maintained.
The comparator function of sort() was modified to achieve this.
bool comp(int a, int b)
{
// This is to maintain order
if ((a >= 0 && b >= 0) || (a < 0 && b < 0))
return false;
// Swapping is must
if ((a >= 0) && (b < 0))
return false;
return true;
}
But how come the order is maintained by this block :
if ((a >= 0 && b >= 0) || (a < 0 && b < 0))
return false;

This appears to be the original article
https://www.geeksforgeeks.org/rearrange-positive-negative-numbers-using-inbuilt-sort-function/
How it (sort of) works: std::sort rearranges everything according to a comparitor. This comparitor is based on "all negative numbers are exactly equal to each other. All positive numbers are exactly equal to each other. Negative numbers are smaller than positive numbers."
If you sort according to those rules, you are going to get all the negative numbers then all the positive numbers. The comparitor itself does not mess with their order, beyond looking at whether they are greater or less than zero. (And the data set conveniently doesn't have any zeroes.)
What's wrong:
1) The comparison function does not correctly handle 0. It gives wrong answers, and even worse, it gives answers which violate strict weak ordering. (see below)
2) std::sort is not a stable sort. It is not guaranteed to preserve order. They got lucky on one data set. If they had used std::stable_sort, and a correct comparison function, it would have been a "built in function" which met the requirements. But the comparison function alone can't make an algorithm stable. See What is the difference between std::sort and std::stable_sort? or just look near the top of the docs on http://www.cplusplus.com/reference/algorithm/sort/
3) They do fancy tricks to come up with a solution of complexity O(n log n), for a trivially easy O(n) problem. So besides being wrong on multiple points, it is inefficient for no good reason.
Perhaps they should have considered https://en.cppreference.com/w/cpp/algorithm/stable_partition if we're allowed to just rule out zeroes in the data.
Here is a definition of strict weak ordering (also linked by Some Programmer Dude)
https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings
For all x in S, it is not the case that x < x (irreflexivity).
For all x, y in S, if x < y then it is not the case that y < x (asymmetry).
For all x, y, z in S, if x < y and y < z then x < z (transitivity).
For all x, y, z in S, if x is incomparable with y (neither x < y nor y < x hold), and y is incomparable with z, then x is incomparable with z (transitivity of incomparability).
Note that comp(0, anything) returns false, so 1 < 0 which makes it easy to break transitivity, in addition to being obviously wrong.

But how come the order is maintained by this block :
The order is not "maintained" in the comparator. The comparator can only tell that after sorting, two elements a and b should be end up
a before b
a after b
a together with b (before the same elements, after the same elements)
What happens to elements that are not ordered by the comparator is only a function of the algorithm. For insertion in a multiset, the added element would end up anywhere (except when insert with hint is used).

The comp function breaks strict weak ordering for
if ((a >= 0 && b >= 0) || (a < 0 && b < 0))
return false;
If a is 0 and b is +ve, then a is less than b, but false is returned, not true
Try
bool comp(int a, int b)
{
const int first = ((a < 0) ? -1 : (a == 0) ? 0 : 1);
const int second = ((b < 0) ? -1 : (b == 0) ? 0 : 1);
return (first < second);
}
and plug that into std::stable_sort

Is there any number repeated in the array?

There's array of size n. The values can be between 0 and (n-1) as the indices.
For example: array[4] = {0, 2, 1, 3}
I should say if there's any number that is repeated more than 1 time.
For example: array[5] = {3,4,1,2,4} -> return true because 4 is repeated.
This question has so many different solutions and I would like to know if this specific solution is alright (if yes, please prove, else refute).
My solution (let's look at the next example):
array: indices 0 1 2 3 4
values 3 4 1 2 0
So I suggest:
count the sum of the indices (4x5 / 2 = 10) and check that the values' sum (3+4+1+2+0) is equal to this sum. if not, there's repeated number.
in addition to the first condition, get the multiplication of the indices(except 0. so: 1x2x3x4) and check if it's equal to the values' multiplication (except 0, so: 3x4x1x2x0).
=> if in each condition, it's equal then I say that there is NO repeated number. otherwise, there IS a repeated number.
Is it correct? if yes, please prove it or show me a link. else, please refute it.

Why your algorithm is wrong?
Your solution is wrong, here is a counter example (there may be simpler ones, but I found this one quite quickly):
int arr[13] = {1, 1, 2, 3, 4, 10, 6, 7, 8, 9, 10, 11, 6};
The sum is 78, and the product is 479001600, if you take the normal array of size 13:
int arr[13] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
It also has a sum of 78 and a product of 479001600 so your algorithm does not work.
How to find counter examples?1
To find a counter example2 3:
Take an array from 0 to N - 1;
Pick two even numbers3 M1 > 2 and M2 > 2 between 0 and N - 1 and halve them;
Replace P1 = M1/2 - 1 by 2 * P1 and P2 = M2/2 + 1 by 2 * P2.
In the original array you have:
Product = M1 * P1 * M2 * P2
Sum = 0 + M1 + P1 + M2 + P2
= M1 + M1/2 - 1 + M2 + M2/2 + 1
= 3/2 * (M1 + M2)
In the new array you have:
Product = M1/2 * 2 * P1 + M2/2 * 2 * P2
= M1 * P1 * M2 * P2
Sum = M1/2 + 2P1 + M2/2 + 2P2
= M1/2 + 2(M1/2 - 1) + M2/2 + 2(M2/2 + 1)
= 3/2 * M1 - 2 + 3/2 * M2 + 2
= 3/2 * (M1 + M2)
So both array have the same sum and product, but one has repeated values, so your algorithm does not work.
1 This is one method of finding counter examples, there may be others (there are probably others).
2 This is not exactly the same method I used to find the first counter example - In the original method, I used only one number M and was using the fact that you can replace 0 by 1 without changing the product, but I propose a more general method here in order to avoid argument such as "But I can add a check for 0 in my algorithm.".
3 That method does not work with small array because you need to find 2 even numbers M1 > 2 and M2 > 2 such that M1/2 != M2 (and reciprocally) and M1/2 - 1 != M2/2 + 1, which (I think) is not possible for any array with a size lower than 14.
What algorithms do work?4
Algorithm 1: O(n) time and space complexity.
If you can allocate a new array of size N, then:
template <std::size_t N>
bool has_repetition (std::array<int, N> const& array) {
std::array<bool, N> rep = {0};
for (auto v: array) {
if (rep[v]) {
return true;
}
rep[v] = true;
}
return false;
}
Algorithm 2: O(nlog(n)) time complexity and O(1) space complexity, with a mutable array.
You can simply sort the array:
template <std::size_t N>
bool has_repetition (std::array<int, N> &array) {
std::sort(std::begin(array), std::end(array));
auto it = std::begin(array);
auto ne = std::next(it);
while (ne != std::end(array)) {
if (*ne == *it) {
return true;
}
++it; ++ne;
}
return false;
}
Algorithm 3: O(n^2) time complexity and O(1) space complexity, with non mutable array.
template <std::size_t N>
bool has_repetition (std::array<int, N> const& array) {
for (auto it = std::begin(array); it != std::end(array); ++it) {
for (auto jt = std::next(it); jt != std::end(array); ++jt) {
if (*it == *jt) {
return true;
}
}
}
return false;
}
4 These algorithms do work, but there may exist other ones that performs better - These are only the simplest ones I could think of given some "restrictions".

What's wrong with your method?
Your method computes some statistics of the data and compares them with those expected for a permutation (= correct answers). While a violation of any of these comparisons is conclusive (the data cannot satisfy the constraint), the inverse is not necessarily the case. You only look at two statistics, and these are too few for sufficiently large data sets. Owing to the fact that the data are integer, the smallest number of data for which your method may fail is larger than 3.

If you are searching duplicates in your array there is simple way:
int N =5;
int array[N] = {1,2,3,4,4};
for (int i = 0; i< N; i++){
for (int j =i+1; j<N; j++){
if(array[j]==array[i]){
std::cout<<"DUPLICATE FOUND\n";
return true;
}
}
}
return false;
Other simple way to find duplicates is using the std::set container for example:
std::set<int> set_int;
set_int.insert(5);
set_int.insert(5);
set_int.insert(4);
set_int.insert(4);
set_int.insert(5);
std::cout<<"\nsize "<<set_int.size();
the output will be 2, because there is 2 individual values

A more in depth explanation why your algorithm is wrong:
count the sum of the indices (4x5 / 2 = 10) and check that the values' sum (3+4+1+2+0) is equal to this sum. if not, there's repeated number.
Given any array A which has no duplicates, it is easy to create an array that meets your first requirement but now contains duplicates. Just take take two values and subtract one of them by some value v and add that value to the other one. Or take multiple values and make sure the sum of them stays the same. (As long as new values are still within the 0 .. N-1 range.) For N = 3 it is already possible to change {0,1,2} to {1,1,1}. For an array of size 3, there are 7 compositions that have correct sum, but 1 is a false positive. For an array of size 4 there are 20 out of 44 have duplicates, for an array of size 5 that's 261 out of 381, for an array of size 6 that's 3612 out of 4332, and so on. It is save to say that the number of false positives grows much faster than real positives.
in addition to the first condition, get the multiplication of the indices(except 0. so: 1x2x3x4) and check if it's equal to the values' multiplication (except 0, so: 3x4x1x2x0).
The second requirement involves the multiplication of all indices above 0. It is easy to realize this is could never be a very strong restriction either. As soon as one of the indices is not prime, the product of all indices is no longer uniquely tied to the multiplicands and a list can be constructed of different values with the same result. E.g. a pair of 2 and 6 can be replaced with 3 and 4, 2 and 9 can be replaced with 6 and 3 and so on. Obviously the number of false positives increases as the array-size gets larger and more non-prime values are used as multiplicands.
None of these requirements is really strong and the cannot compensate for the other. Since 0 is not even considered for the second restriction a false positive can be created fairly easy for arrays starting at size 5. any pair of 0 and 4 can simply be replaced with two 2's in any unique array, for example {2, 1, 2, 3, 2}
What you would need, is to have a result that is uniquely tight to the occurring values. You could tweak your second requirement to a more complex approach and skip over the non-prime values and take 0 into account. For example you could use the first prime as multiplicand (2) for 0, use 3 as multiplicand for 1, 5 as multiplicand for 2, and so on. That would work (you would not need the first requirement), but this approach would be overly complex. An simpler way to get a unique result would be to OR the i-th bit for each value (0 => 1 << 0, 1 => 1 << 1, 2 => 1 << 2, and so on. (Obviously it is faster to check wether a bit was already set by a reoccurring value, rather than wait for the final result. And this is conceptually the same as using a bool array/vector from the other examples!)

C++ multidimensional arrays

i was thinkg about writing a code that creates a pascal triangle. I 've done it but then i thought about doing it better. One idea came to my mind but i couldnt find a proper answer for it. Is it possible to create an array which will be look like that?
[1]|[1][1]|[1][2][1]|[1][3][3][1]|[1][4][6][4][1]| and so on? so my [1] would be (0,0) and [1][2][1] would be elements of cells(2,0),(2,1),(2,2). I would be grateful for any advise.

You can implement triangle array through a single-dimension array. Fixed-size array may look like this:
template<typename T, size_t N>
struct TriangleArray {
T& element(size_t i, size_t j)
{
if (i >= N || j >= N || i < j)
throw std::out_of_range("incorrect index");
return container[(i + 1) * i / 2 + j];
}
private:
T container[(N + 1) * N / 2];
};

No it's not possible. In an array, all the element must have the same type. Two dimensional arrays are arrays of arrays. That means that for a multidimensional array, all the line must have the same length. You should probably use a
std::vector<std::vector<int> >
here. Or a one dimensional array and and the logic to compute the 1 dim position from the 2 dim index:
index = row*(row+1)/2 + column.
See iterate matrix without nested loop if you want the reverse indexing.
Edit: fixed my formula which was off by one. Here is a check in Python:
The following index function takes row, col and compute the corresponding index in a one dimensional array using my formula:
>>> index = lambda row, col: row*(row+1)/2 + col
Here are the coordinate pairs
>>> [[(i,j) for j in range(i+1)] for i in range(5)]
[[(0, 0)],
[(1, 0), (1, 1)],
[(2, 0), (2, 1), (2, 2)],
[(3, 0), (3, 1), (3, 2), (3, 3)],
[(4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]]
I'm now checking that the corresponding index are the sequence of integer starting from 0 (indentation of the printing is mine):
>>> [[index(i,j) for j in range(i+1)] for i in range(5)]
[[0],
[1, 2],
[3, 4, 5],
[6, 7, 8, 9],
[10, 11, 12, 13, 14]]

The nicest thing would be to wrap the whole thing in a class called PascalTriangle and implement it along the following lines:
class PascalTriangle
{
private:
std::vector<std::vector<int> > m_data;
std::vector<int> CalculateRow(int row_index) const
{
// left as an exercise :)
}
public:
PascalTriangle(int num_rows) :
m_data()
{
assert(num_rows >= 0);
for (int row_index = 0; row_index < num_rows; ++row_index)
{
m_data.push_back(CalculateRow(row_index));
}
}
int operator()(int row_index, int column_index) const
{
assert(row_index >= 0 && row_index < m_data.size());
assert(column_index >= 0 && column_index < row_index);
return m_data[row_index][column_index];
}
};
Now here comes the catch: this approach allows you to perform lazy evaluation. Consider the following case: you might not always need each and every value. For example, you may only be interested in the 5th row. Then why store the other, unused values?
Based on this idea, here's an advanced version of the previous class:
class PascalTriangle
{
private:
int m_num_rows;
std::vector<int> CalculateRow(int row_index) const
{
// left as an exercise :)
}
public:
PascalTriangle(int num_rows) :
m_num_rows(num_rows)
{
assert(num_rows >= 0);
// nothing is done here!
}
int operator()(int row_index, int column_index) const
{
assert(row_index >= 0 && row_index < m_num_rows);
assert(column_index >= 0 && column_index < row_index);
return CalculateRow(row_index)[column_index];
}
};
Notice that the public interface of the class remains exactly the same, yet its internals are completely different. Such are the advantages of proper encapsulation. You effectively centralise error handling and optimisation points.
I hope these ideas inspire you to think more about the operations you want to perform with your Pascal triangle, because they will dictate the most appropriate data structure.
Edit: by request, here are some more explanations:
In the first version, m_data is a vector of vectors. Each contained std::vector<int> represents a row in the triangle.
The operator() function is a syntactical helper, allowing you to access PascalTriangle objects like this:
PascalTriangle my_triangle(10);
int i = my_triangle(3, 2);
assert makes sure that your code does not operate on illegal values, e.g. a negative row count or a row index greater than the triangle. But this is just one possible error reporting mechanism. You could also use exceptions, or error return values, or the Fallible idiom (std::optional). See past Stackoverflow questions for which error reporting mechanism to use when. This is a pure software-engineering aspect and has nothing to do with maths, but as you can imagine, it's, well, very important in software :)
CalculateRow returns a std::vector<int> representing the row specified by row_index. To implement it correctly, you'll need some maths. This is what I just found on Google: http://www.mathsisfun.com/pascals-triangle.html
In order to apply the maths, you'll want to know how to calculate n! in C++. There have been a lot of past Stackoverflow questions on this, for example here: Calculating large factorials in C++
Note that with the class approach, you can easily switch to another implementation later on. (You can even take it to the extreme and switch to a specific calculation algorithm based on the triangle height, without the users of the class ever noticing anything! See how powerful proper encapsulation can be?)
In the second version of the class, there is no permanent data storage anymore. CalculateRow is called only if and when needed, but the client of the class doesn't know this. As an additional possibly performance-improving measure, you could remember rows which you already calculated, for example by adding a private std::map<int, std::vector<int> > member variable whose int key represents the row index and whose values the rows. Every CalculateRow call would then first look if the result is already there, and add calculated ones at the end:
private mutable std::map<int, std::vector<int> > m_cache;
std::vector<int> CalculateRow(int row_index) const
{
// find the element at row_index:
std::map<int, std::vector<int> >::const_iterator cache_iter =
m_cache.find(row_index);
// is it there?
if (cache_iter != m_cache.end())
{
// return its value, no need to calculate it again:
return cache_iter->second;
}
// actual calculation of result left as an exercise :)
m_cache[row_index] = result;
return result;
}
By the way, this would also be a nice application of the new C++11 auto keyword. For example, you'd then just write auto cache_iter = m_cache.find(row_index);
And here's for another edit: I made m_cache mutable, because otherwise the thing wouldn't compile, as CalculateRow is a const member function (i.e. shouldn't change an object of the class from the client's point of view). This is a typical idiom for cache member variables.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js