Sorting elements by decreasing frequency of occurence

Sorting elements by decreasing frequency of occurence - c++

I am trying to solve the question:
Print the elements of an array in the decreasing frequency if 2 numbers have same frequency then print the one which came first. (https://www.geeksforgeeks.org/sort-elements-by-frequency/)
I am trying to implement the solution on my own. I have thought of creating the following data structure:
map<int,pair<int,int>> mymap
I am storing the number itself in the first int, and I am storing the index and count of the number in the array in the pair in the above map.
I want to write a custom comparator for sorting the pairs, something like this:
bool cmp(pair<int,int>&a, pair<int,int>&b)
{
if (a.first == b.first)
return a < b;
else
return a > b;
}
I am still learning to write custom comparators. I am not able to wrap around my head, that how can I pass the comparator for sorting the map. Also, if pairs are sorted, then will the key in the map be sorted alongside?
Please let me know! Thanks!

You don't need to use maps for this, or better, not in this way. You can use a basic array arr that will contain elements and than use a map cnt<int,int> which will keep the number of occurrences of each element in the array and another one firstIndex<int,int> which will keep the index of the first appearance of the element. In this case the sorting function becomes simply:
bool cmp(int a, int b)
{
if(cnt[a] != cnt[b]){
return cnt[a] > cnt[b];
} else {
return firstIndex[a] < firstIndex[b];
}
}
use it like this:
sort(arr, arr+n, cmp);
where n is the number of elements in the array.

Related

Will std::sort always compare equal values?

I am doing the following problem on leetcode: https://leetcode.com/problems/contains-duplicate/
Given an integer array nums, return true if any value appears at least
twice in the array, and return false if every element is distinct.
The solution I came up to the problem is the following:
class Solution {
public:
bool containsDuplicate(vector<int>& nums) {
try {
std::sort(nums.begin(), nums.end(), [](int a, int b) {
if (a == b) {
throw std::runtime_error("found duplicate");
}
return a < b;
});
} catch (const std::runtime_error& e) {
return true;
}
return false;
}
};
It was accepted on leetcode but I am still not sure if it will always work. The idea is to start sorting nums array and interrupt as soon as duplicate values are found inside comparator. Sorting algorithm can compare elements in many ways. I expect that equal elements will be always compared but I am not sure about this. Will std::sort always compare equal values or sometimes it can skip comparing them and therefore duplicate values will not be found?

Will std::sort always compare equal values or sometimes it can skip comparing them and therefore duplicate values will not be found?
Yes, some equal value elements will always be compared if duplicates do exist.
Let us assume the opposite: initial array of elements {e} for sorting contains a subset of elements having the same value and a valid sorting algorithm does not call comparison operator < for any pair of the elements from the subset.
Then we construct same sized array of tuples {e,k}, with the first tuple value from the initial array and arbitrary selected second tuple value k, and apply the same sorting algorithm using the lexicographic comparison operator for the tuples. The order of tuples after sorting can deviate from the order of sorted elements {e} only for same value elements, where in the case of array of tuples it will depend on second tuple value k.
Since we assumed that the sorting algorithm does not compare any pair of same value elements, then it will not compare the tuples with the same first tuple value, so the algorithm will be unable to sort them properly. This contradicts our assumptions and proves that some equal value elements (if they exist in the array) will always be compared during sorting.

How to insert to a vector to ensure it remains sorted? [duplicate]

ALL,
This question is a continuation of this one.
I think that STL misses this functionality, but it just my IMHO.
Now, to the question.
Consider following code:
class Foo
{
public:
Foo();
int paramA, paramB;
std::string name;
};
struct Sorter
{
bool operator()(const Foo &foo1, const Foo &foo2) const
{
switch( paramSorter )
{
case 1:
return foo1.paramA < foo2.paramA;
case 2:
return foo1.paramB < foo2.paramB;
default:
return foo1.name < foo2.name;
}
}
int paramSorter;
};
int main()
{
std::vector<Foo> foo;
Sorter sorter;
sorter.paramSorter = 0;
// fill the vector
std::sort( foo.begin(), foo.end(), sorter );
}
At any given moment of time the vector can be re-sorted.
The class also have the getter methods which are used in the sorter structure.
What would be the most efficient way to insert a new element in the vector?
Situation I have is:
I have a grid (spreadsheet), that uses the sorted vector of a class. At any given time the vector can be re-sorted and the grid will display the sorted data accordingly.
Now I will need to insert a new element in the vector/grid.
I can insert, then re-sort and then re-display the whole grid, but this is very inefficient especially for the big grid.
Any help would be appreciated.

The simple answer to the question:
template< typename T >
typename std::vector<T>::iterator
insert_sorted( std::vector<T> & vec, T const& item )
{
return vec.insert
(
std::upper_bound( vec.begin(), vec.end(), item ),
item
);
}
Version with a predicate.
template< typename T, typename Pred >
typename std::vector<T>::iterator
insert_sorted( std::vector<T> & vec, T const& item, Pred pred )
{
return vec.insert
(
std::upper_bound( vec.begin(), vec.end(), item, pred ),
item
);
}
Where Pred is a strictly-ordered predicate on type T.
For this to work the input vector must already be sorted on this predicate.
The complexity of doing this is O(log N) for the upper_bound search (finding where to insert) but up to O(N) for the insert itself.
For a better complexity you could use std::set<T> if there are not going to be any duplicates or std::multiset<T> if there may be duplicates. These will retain a sorted order for you automatically and you can specify your own predicate on these too.
There are various other things you could do which are more complex, e.g. manage a vector and a set / multiset / sorted vector of newly added items then merge these in when there are enough of them. Any kind of iterating through your collection will need to run through both collections.
Using a second vector has the advantage of keeping your data compact. Here your "newly added" items vector will be relatively small so the insertion time will be O(M) where M is the size of this vector and might be more feasible than the O(N) of inserting in the big vector every time. The merge would be O(N+M) which is better than O(NM) it would be inserting one at a time, so in total it would be O(N+M) + O(M²) to insert M elements then merge.
You would probably keep the insertion vector at its capacity too, so as you grow that you will not be doing any reallocations, just moving of elements.

If you need to keep the vector sorted all the time, first you might consider whether using std::set or std::multiset won't simplify your code.
If you really need a sorted vector and want to quickly insert an element into it, but do not want to enforce a sorting criterion to be satisfied all the time, then you can first use std::lower_bound() to find the position in a sorted range where the element should be inserted in logarithmic time, then use the insert() member function of vector to insert the element at that position.
If performance is an issue, consider benchmarking std::list vs std::vector. For small items, std::vector is known to be faster because of a higher cache hit rate, but the insert() operation itself is computationally faster on lists (no need to move elements around).

Just a note, you can use upper_bound as well depending on your needs. upper_bound will assure new entries that are equivalent to others will appear at the end of their sequence, lower_bound will assure new entries equivalent to others will appear at the beginning of their sequence. Can be useful for certain implementations (maybe classes that can share a "position" but not all of their details!)
Both will assure you that the vector remains sorted according to < result of elements, although inserting into lower_bound will mean moving more elements.
Example:
insert 7 # lower_bound of { 5, 7, 7, 9 } => { 5, *7*, 7, 7, 9 }
insert 7 # upper_bound of { 5, 7, 7, 9 } => { 5, 7, 7, *7*, 9 }

Instead of inserting and sorting. You should do a find and then insert
Keep the vector sorted. (sort once). When you have to insert
find the first element that compares as greater to the one you are going to insert.
Do an insert just before that position.
This way the vector stays sorted.
Here is an example of how it goes.
start {} empty vector
insert 1 -> find first greater returns end() = 1 -> insert at 1 -> {1}
insert 5 -> find first greater returns end() = 2 -> insert at 2 -> {1,5}
insert 3 -> find first greater returns 2 -> insert at 2 -> {1,3,5}
insert 4 -> find first greater returns 3 -> insert at 3 -> {1,3,4,5}

When you want to switch between sort orders, you can use multiple index datastructures, each of which you keep in sorted order (probably some kind of balanced tree, like std::map, which maps sort-keys to vector-indices, or std::set to store pointers to youre obects - but with different comparison functions).
Here's a library which does this: http://www.boost.org/doc/libs/1_53_0/libs/multi_index/doc/index.html
For every change (insert of new elements or update of keys) you must update all index datastructure, or flag them as invalid.
This works if there are not "too many" sort orders and not "too many" updates of your datastructure. Otherwise - bad luck, you have to re-sort everytime you want to change the order.
In other words: The more indices you need (to speed up lookup operations), the more time you need for update operations. And every index needs memory, of course.
To keep the count of indices small, you could use some query engine which combines the indices of several fields to support more complex sort orders over several fields. Like an SQL query optimizer. But that may be overkill...
Example: If you have two fields, a and b, you can support 4 sort orders:
a
b
first a then b
first b then a
with 2 indices (3. and 4.).
With more fields, the possible combinations of sort orders gets big, fast. But you can still use an index which sorts "almost as you want it" and, during the query, sort the remaining fields you couldn't catch with that index, as needed. For sorted output of the whole data, this doesn't help much. But if you only want to lookup some elements, the first "narrowing down" can help much.

Here is one I wrote for simplicity. Horribly slow for large sets but fine for small sets. It sorts as values are added:
void InsertionSortByValue(vector<int> &vec, int val)
{
vector<int>::iterator it;
for (it = vec.begin(); it < vec.end(); it++)
{
if (val < *it)
{
vec.insert(it, val);
return;
}
}
vec.push_back(val);
}
int main()
{
vector<int> vec;
for (int i = 0; i < 20; i++)
InsertionSortByValue(vec, rand()%20);
}
Here is another I found on some website. It sorts by array:
void InsertionSortFromArray(vector<int> &vec)
{
int elem;
unsigned int i, j, k, index;
for (i = 1; i < vec.size(); i++)
{
elem = vec[i];
if (elem < vec[i-1])
{
for (j = 0; j <= i; j++)
{
if (elem < vec[j])
{
index = j;
for (k = i; k > j; k--)
vec[k] = vec[k-1];
break;
}
}
}
else
continue;
vec[index] = elem;
}
}
int main()
{
vector<int> vec;
for (int i = 0; i < 20; i++)
vec.push_back(rand()%20);
InsertionSortFromArray(vec);
}

Assuming you really want to use a vector, and the sort criterium or keys don't change (so the order of already inserted elements always stays the same):
Insert the element at the end, then move it to the front one step at a time, until the preceeding element isn't bigger.
It can't be done faster (regarding asymptotic complexity, or "big O notation"), because you must move all bigger elements. And that's the reason why STL doesn't provide this - because it's inefficient on vectors, and you shouldn't use them if you need it.
Edit: Another assumption: Comparing the elements is not much more expensive than moving them. See comments.
Edit 2: As my first assumption doesn't hold (you want to change the sort criterium), scrap this answer and see my other one: https://stackoverflow.com/a/15843955/1413374

how do you insert the value in a sorted vector?

ALL,
This question is a continuation of this one.
I think that STL misses this functionality, but it just my IMHO.
Now, to the question.
Consider following code:
class Foo
{
public:
Foo();
int paramA, paramB;
std::string name;
};
struct Sorter
{
bool operator()(const Foo &foo1, const Foo &foo2) const
{
switch( paramSorter )
{
case 1:
return foo1.paramA < foo2.paramA;
case 2:
return foo1.paramB < foo2.paramB;
default:
return foo1.name < foo2.name;
}
}
int paramSorter;
};
int main()
{
std::vector<Foo> foo;
Sorter sorter;
sorter.paramSorter = 0;
// fill the vector
std::sort( foo.begin(), foo.end(), sorter );
}
At any given moment of time the vector can be re-sorted.
The class also have the getter methods which are used in the sorter structure.
What would be the most efficient way to insert a new element in the vector?
Situation I have is:
I have a grid (spreadsheet), that uses the sorted vector of a class. At any given time the vector can be re-sorted and the grid will display the sorted data accordingly.
Now I will need to insert a new element in the vector/grid.
I can insert, then re-sort and then re-display the whole grid, but this is very inefficient especially for the big grid.
Any help would be appreciated.

The simple answer to the question:
template< typename T >
typename std::vector<T>::iterator
insert_sorted( std::vector<T> & vec, T const& item )
{
return vec.insert
(
std::upper_bound( vec.begin(), vec.end(), item ),
item
);
}
Version with a predicate.
template< typename T, typename Pred >
typename std::vector<T>::iterator
insert_sorted( std::vector<T> & vec, T const& item, Pred pred )
{
return vec.insert
(
std::upper_bound( vec.begin(), vec.end(), item, pred ),
item
);
}
Where Pred is a strictly-ordered predicate on type T.
For this to work the input vector must already be sorted on this predicate.
The complexity of doing this is O(log N) for the upper_bound search (finding where to insert) but up to O(N) for the insert itself.
For a better complexity you could use std::set<T> if there are not going to be any duplicates or std::multiset<T> if there may be duplicates. These will retain a sorted order for you automatically and you can specify your own predicate on these too.
There are various other things you could do which are more complex, e.g. manage a vector and a set / multiset / sorted vector of newly added items then merge these in when there are enough of them. Any kind of iterating through your collection will need to run through both collections.
Using a second vector has the advantage of keeping your data compact. Here your "newly added" items vector will be relatively small so the insertion time will be O(M) where M is the size of this vector and might be more feasible than the O(N) of inserting in the big vector every time. The merge would be O(N+M) which is better than O(NM) it would be inserting one at a time, so in total it would be O(N+M) + O(M²) to insert M elements then merge.
You would probably keep the insertion vector at its capacity too, so as you grow that you will not be doing any reallocations, just moving of elements.

If you need to keep the vector sorted all the time, first you might consider whether using std::set or std::multiset won't simplify your code.
If you really need a sorted vector and want to quickly insert an element into it, but do not want to enforce a sorting criterion to be satisfied all the time, then you can first use std::lower_bound() to find the position in a sorted range where the element should be inserted in logarithmic time, then use the insert() member function of vector to insert the element at that position.
If performance is an issue, consider benchmarking std::list vs std::vector. For small items, std::vector is known to be faster because of a higher cache hit rate, but the insert() operation itself is computationally faster on lists (no need to move elements around).

Just a note, you can use upper_bound as well depending on your needs. upper_bound will assure new entries that are equivalent to others will appear at the end of their sequence, lower_bound will assure new entries equivalent to others will appear at the beginning of their sequence. Can be useful for certain implementations (maybe classes that can share a "position" but not all of their details!)
Both will assure you that the vector remains sorted according to < result of elements, although inserting into lower_bound will mean moving more elements.
Example:
insert 7 # lower_bound of { 5, 7, 7, 9 } => { 5, *7*, 7, 7, 9 }
insert 7 # upper_bound of { 5, 7, 7, 9 } => { 5, 7, 7, *7*, 9 }

Instead of inserting and sorting. You should do a find and then insert
Keep the vector sorted. (sort once). When you have to insert
find the first element that compares as greater to the one you are going to insert.
Do an insert just before that position.
This way the vector stays sorted.
Here is an example of how it goes.
start {} empty vector
insert 1 -> find first greater returns end() = 1 -> insert at 1 -> {1}
insert 5 -> find first greater returns end() = 2 -> insert at 2 -> {1,5}
insert 3 -> find first greater returns 2 -> insert at 2 -> {1,3,5}
insert 4 -> find first greater returns 3 -> insert at 3 -> {1,3,4,5}

Here is one I wrote for simplicity. Horribly slow for large sets but fine for small sets. It sorts as values are added:
void InsertionSortByValue(vector<int> &vec, int val)
{
vector<int>::iterator it;
for (it = vec.begin(); it < vec.end(); it++)
{
if (val < *it)
{
vec.insert(it, val);
return;
}
}
vec.push_back(val);
}
int main()
{
vector<int> vec;
for (int i = 0; i < 20; i++)
InsertionSortByValue(vec, rand()%20);
}
Here is another I found on some website. It sorts by array:
void InsertionSortFromArray(vector<int> &vec)
{
int elem;
unsigned int i, j, k, index;
for (i = 1; i < vec.size(); i++)
{
elem = vec[i];
if (elem < vec[i-1])
{
for (j = 0; j <= i; j++)
{
if (elem < vec[j])
{
index = j;
for (k = i; k > j; k--)
vec[k] = vec[k-1];
break;
}
}
}
else
continue;
vec[index] = elem;
}
}
int main()
{
vector<int> vec;
for (int i = 0; i < 20; i++)
vec.push_back(rand()%20);
InsertionSortFromArray(vec);
}

Assuming you really want to use a vector, and the sort criterium or keys don't change (so the order of already inserted elements always stays the same):
Insert the element at the end, then move it to the front one step at a time, until the preceeding element isn't bigger.
It can't be done faster (regarding asymptotic complexity, or "big O notation"), because you must move all bigger elements. And that's the reason why STL doesn't provide this - because it's inefficient on vectors, and you shouldn't use them if you need it.
Edit: Another assumption: Comparing the elements is not much more expensive than moving them. See comments.
Edit 2: As my first assumption doesn't hold (you want to change the sort criterium), scrap this answer and see my other one: https://stackoverflow.com/a/15843955/1413374

Efficient Data Structure for Insertion

I'm looking for a data structure (array-like) that allows fast (faster than O(N)) arbitrary insertion of values into the structure. The data structure must be able to print out its elements in the way they were inserted. This is similar to something like List.Insert() (which is too slow as it has to shift every element over), except I don't need random access or deletion. Insertion will always be within the size of the 'array'. All values are unique. No other operations are needed.
For example, if Insert(x, i) inserts value x at index i (0-indexing). Then:
Insert(1, 0) gives {1}
Insert(3, 1) gives {1,3}
Insert(2, 1) gives {1,2,3}
Insert(5, 0) gives {5,1,2,3}
And it'll need to be able to print out {5,1,2,3} at the end.
I am using C++.

Use skip list. Another option should be tiered vector. The skip list performs inserts at const O(log(n)) and keeps the numbers in order. The tiered vector supports insert in O(sqrt(n)) and again can print the elements in order.
EDIT: per the comment of amit I will explain how do you find the k-th element in a skip list:
For each element you have a tower on links to next elements and for each link you know how many elements does it jump over. So looking for the k-th element you start with the head of the list and go down the tower until you find a link that jumps over no more then k elements. You go to the node pointed to by this node and decrease k with the number of elements you have jumped over. Continue doing that until you have k = 0.

Did you consider using std::map or std::vector ?
You could use a std::map with the rank of insertion as key. And vector has a reserve member function.

You can use an std::map mapping (index, insertion-time) pairs to values, where insertion-time is an "autoincrement" integer (in SQL terms). The ordering on the pairs should be
(i, t) < (i*, t*)
iff
i < i* or t > t*
In code:
struct lt {
bool operator()(std::pair<size_t, size_t> const &x,
std::pair<size_t, size_t> const &y)
{
return x.first < y.first || x.second > y.second;
}
};
typedef std::map<std::pair<size_t, size_t>, int, lt> array_like;
void insert(array_like &a, int value, size_t i)
{
a[std::make_pair(i, a.size())] = value;
}

Regarding your comment:
List.Insert() (which is too slow as it has to shift every element over),
Lists don't shift their values, they iterate over them to find the location you want to insert, be careful what you say. This can be confusing to newbies like me.

A solution that's included with GCC by default is the rope data structure. Here is the documentation. Typically, ropes come to mind when working with long strings of characters. Here we have ints instead of characters, but it works the same. Just use int as the template parameter. (Could also be pairs, etc.)
Here's the description of rope on Wikipedia.
Basically, it's a binary tree that maintains how many elements are in the left and right subtrees (or equivalent information, which is what's referred to as order statistics), and these counts are updated appropriately as subtrees are rotated when elements are inserted and removed. This allows O(lg n) operations.

There's this data structure which pushes insertion time down from O(N) to O(sqrt(N)) but I'm not that impressed. I feel one should be able to do better but I'll have to work at it a bit.

In c++ you can just use a map of vectors, like so:
int main() {
map<int, vector<int> > data;
data[0].push_back(1);
data[1].push_back(3);
data[1].push_back(2);
data[0].push_back(5);
map<int, vector<int> >::iterator it;
for (it = data.begin(); it != data.end(); it++) {
vector<int> v = it->second;
for (int i = v.size() - 1; i >= 0; i--) {
cout << v[i] << ' ';
}
}
cout << '\n';
}
This prints:
5 1 2 3
Just like you want, and inserts are O(log n).

How do I remove duplicates from a C++ array?

I have an array of structs; the array is of size N.
I want to remove duplicates from the array; that is, do an in-place change, converting the array to have a single appearance of each struct. Additionally, I want to know the new size M (highest index in the reduced array).
The structs include primitives so it's trivial to compare them.
How can I do that efficiently in C++?
I have implemented the following operators:
bool operator==(const A &rhs1, const A &rhs2)
{
return ( ( rhs1.x== rhs2.x ) &&
( rhs1.y == rhs2.y ) );
}
bool operator<(const A &rhs1, const A &rhs2)
{
if ( rhs1.x == rhs2.x )
return ( rhs1.y < rhs2.y );
return ( rhs1.x < rhs2.x );
}
However, I get an error when running:
std::sort(array, array+ numTotalAvailable);
* array will have all elements here valid.
std::unique_copy(
array,
array+ numTotalAvailable,
back_inserter(uniqueElements));
* uniqueElements will have non-valid elements.
What is wrong here?

You could use a combination of the std::sort and std::unique algorithms to accomplish this:
std::sort(elems.begin(), elems.end()); // Now in sorted order.
iterator itr = std::unique(elems.begin(), elems.end()); // Duplicates overwritten
elems.erase(itr, elems.end()); // Space reclaimed
If you are working with a raw array (not, say, a std::vector), then you can't actually reclaim the space without copying the elements over to a new range. However, if you're okay starting off with a raw array and ending up with something like a std::vector or std::deque, you can use unique_copy and an iterator adapter to copy over just the unique elements:
std::sort(array, array + size); // Now in sorted order
std::vector<T> uniqueElements;
std::unique_copy(array, array + size,
back_inserter(uniqueElements)); // Append unique elements
At this point, uniqueElements now holds all the unique elements.
Finally, to more directly address your initial question: if you want to do this in-place, you can get the answer by using the return value from unique to determine how many elements remain:
std::sort(elems, elems + N); // Now in sorted order.
T* endpoint = std::unique(elems, elems + N);// Duplicates overwritten
ptrdiff_t M = endpoint - elems; // Find number of elements left
Hope this helps!

std::set<T> uniqueItems(v.begin(), v.end());
Now uniqueItems contains only the unique items. Do whatever you want to do with it. Maybe, you would like v to contain all the unique items. If so, then do this:
//assuming v is std::vector<T>
std::vector<T>(uniqueItems.begin(), uniqueItems.end()).swap(v);
Now v contains all the unique items. It also shrinks v to a minimum size. It makes use of Shrink-to-fit idiom.

You could use the flyweight pattern. Easiest way to do so, would be using the Boost Flyweight library.
Edit: I'm not sure if there is some way to find out how many objects are stored by the Boost flyweight implementation, if there is, I can't seem to find it in the documentation.

An alternative approach to applying algorithms to your array would be to insert its elements in a std::set. Whether it is reasonable to do it this way depends on how you plan to use your items.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js