squeeze table: same row overlap together, and count the number - c++

suppose I have a table like this: (table is a 2-d array in C++)
number is the count for each row.
1 a b c
1 a b c
1 c d e
1 b c d
1 b c d
with be squeezed to:
2 a b c
1 c d e
2 b c d
My algorithm is O(n*n), can some one improve it?
suppose t1 is original one;
initial another t2;
row_num = 1;
copy first row of t1 to t2;
foreach row in t1 (1 to n)
search each row in t2 (0 to row_num);
if equal, then add the number;
break;
if not found, then copy current t1's row to t2;
row_num++

If your data's sorted like in the example, then it's just O(n).
Use a std::sort(or any other O(nlogn) sort) to order your arrays. Then it's just another pass and it's done :)

Here's a working example of O(N log N) complexity. It first sorts the data, then loops over each element and counts the number occurances by looking for the first mismatch, and then storing the sum of the counts from the current element in a result vector. Note that you can also have counts different from 1 in your initial arrays. The code works without having to specify a specific comparison function because std::array already has a lexicographic operator<.
The code below uses C++11 features (auto, lambda) that might not work on your compiler. You might also use initalizer lists to initialize the vector in one statement, but withthe nested vector of pair of int and array, I got a little confused on how many braces I needed to write :-)
#include <algorithm>
#include <array>
#include <iostream>
#include <utility>
#include <vector>
typedef std::pair<int, std::array<char, 3> > Element;
std::vector< Element > v;
std::vector< Element > result;
int main()
{
v.push_back( Element(1, std::array<char, 3>{{'a', 'b', 'c'}}) );
v.push_back( Element(2, std::array<char, 3>{{'a', 'b', 'c'}}) );
v.push_back( Element(1, std::array<char, 3>{{'c', 'd', 'e'}}) );
v.push_back( Element(1, std::array<char, 3>{{'b', 'c', 'd'}}) );
v.push_back( Element(3, std::array<char, 3>{{'b', 'c', 'd'}}) );
// O(N log(N) ) complexity
std::sort(v.begin(), v.end(), [](Element const& e1, Element const& e2){
// compare the array part of the pair<int, array>
return e1.second < e2.second;
});
// O(N) complexity
for (auto it = v.begin(); it != v.end();) {
// find next element
auto last = std::find_if(it, v.end(), [=](Element const& elem){
return it->second != elem.second;
});
// accumulate the counts
auto count = std::accumulate(it, last, 0, [](int sub, Element const& elem) {
return sub + elem.first;
});
// store count in result
result.push_back( Element(count, it->second) );
it = last;
}
for (auto it = result.begin(); it != result.end(); ++it) {
std::cout << it->first << " ";
for (std::size_t i = 0; i < 3; ++i)
std::cout << it->second[i] << " ";
std::cout << "\n";
}
}
Output on Ideone
NOTE: the loop over the sorted elements might seem O(N^2) (a linear std::find_if nested inside a linear for), but it is O(N) because of the last loop statement it = last that skips over already searched elements.

Related

How do you "align" a sort across multiple vectors? [duplicate]

Using C++, and hopefully the standard library, I want to sort a sequence of samples in ascending order, but I also want to remember the original indexes of the new samples.
For example, I have a set, or vector, or matrix of samples A : [5, 2, 1, 4, 3]. I want to sort these to be B : [1,2,3,4,5], but I also want to remember the original indexes of the values, so I can get another set which would be:
C : [2, 1, 4, 3, 0 ] - which corresponds to the index of each element in 'B', in the original 'A'.
For example, in Matlab you can do:
[a,b]=sort([5, 8, 7])
a = 5 7 8
b = 1 3 2
Can anyone see a good way to do this?
Using C++ 11 lambdas:
#include <iostream>
#include <vector>
#include <numeric> // std::iota
#include <algorithm> // std::sort, std::stable_sort
using namespace std;
template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) {
// initialize original index locations
vector<size_t> idx(v.size());
iota(idx.begin(), idx.end(), 0);
// sort indexes based on comparing values in v
// using std::stable_sort instead of std::sort
// to avoid unnecessary index re-orderings
// when v contains elements of equal values
stable_sort(idx.begin(), idx.end(),
[&v](size_t i1, size_t i2) {return v[i1] < v[i2];});
return idx;
}
Now you can use the returned index vector in iterations such as
for (auto i: sort_indexes(v)) {
cout << v[i] << endl;
}
You can also choose to supply your original index vector, sort function, comparator, or automatically reorder v in the sort_indexes function using an extra vector.
You could sort std::pair instead of just ints - first int is original data, second int is original index. Then supply a comparator that only sorts on the first int. Example:
Your problem instance: v = [5 7 8]
New problem instance: v_prime = [<5,0>, <8,1>, <7,2>]
Sort the new problem instance using a comparator like:
typedef std::pair<int,int> mypair;
bool comparator ( const mypair& l, const mypair& r)
{ return l.first < r.first; }
// forgetting the syntax here but intent is clear enough
The result of std::sort on v_prime, using that comparator, should be:
v_prime = [<5,0>, <7,2>, <8,1>]
You can peel out the indices by walking the vector, grabbing .second from each std::pair.
Suppose Given vector is
A=[2,4,3]
Create a new vector
V=[0,1,2] // indicating positions
Sort V and while sorting instead of comparing elements of V , compare corresponding elements of A
//Assume A is a given vector with N elements
vector<int> V(N);
std::iota(V.begin(),V.end(),0); //Initializing
sort( V.begin(),V.end(), [&](int i,int j){return A[i]<A[j];} );
vector<pair<int,int> >a;
for (i = 0 ;i < n ; i++) {
// filling the original array
cin >> k;
a.push_back (make_pair (k,i)); // k = value, i = original index
}
sort (a.begin(),a.end());
for (i = 0 ; i < n ; i++){
cout << a[i].first << " " << a[i].second << "\n";
}
Now a contains both both our values and their respective indices in the sorted.
a[i].first = value at i'th.
a[i].second = idx in initial array.
I wrote generic version of index sort.
template <class RAIter, class Compare>
void argsort(RAIter iterBegin, RAIter iterEnd, Compare comp,
std::vector<size_t>& indexes) {
std::vector< std::pair<size_t,RAIter> > pv ;
pv.reserve(iterEnd - iterBegin) ;
RAIter iter ;
size_t k ;
for (iter = iterBegin, k = 0 ; iter != iterEnd ; iter++, k++) {
pv.push_back( std::pair<int,RAIter>(k,iter) ) ;
}
std::sort(pv.begin(), pv.end(),
[&comp](const std::pair<size_t,RAIter>& a, const std::pair<size_t,RAIter>& b) -> bool
{ return comp(*a.second, *b.second) ; }) ;
indexes.resize(pv.size()) ;
std::transform(pv.begin(), pv.end(), indexes.begin(),
[](const std::pair<size_t,RAIter>& a) -> size_t { return a.first ; }) ;
}
Usage is the same as that of std::sort except for an index container to receive sorted indexes.
testing:
int a[] = { 3, 1, 0, 4 } ;
std::vector<size_t> indexes ;
argsort(a, a + sizeof(a) / sizeof(a[0]), std::less<int>(), indexes) ;
for (size_t i : indexes) printf("%d\n", int(i)) ;
you should get 2 1 0 3.
for the compilers without c++0x support, replace the lamba expression as a class template:
template <class RAIter, class Compare>
class PairComp {
public:
Compare comp ;
PairComp(Compare comp_) : comp(comp_) {}
bool operator() (const std::pair<size_t,RAIter>& a,
const std::pair<size_t,RAIter>& b) const { return comp(*a.second, *b.second) ; }
} ;
and rewrite std::sort as
std::sort(pv.begin(), pv.end(), PairComp(comp)()) ;
I came across this question, and figured out sorting the iterators directly would be a way to sort the values and keep track of indices; There is no need to define an extra container of pairs of ( value, index ) which is helpful when the values are large objects; The iterators provides the access to both the value and the index:
/*
* a function object that allows to compare
* the iterators by the value they point to
*/
template < class RAIter, class Compare >
class IterSortComp
{
public:
IterSortComp ( Compare comp ): m_comp ( comp ) { }
inline bool operator( ) ( const RAIter & i, const RAIter & j ) const
{
return m_comp ( * i, * j );
}
private:
const Compare m_comp;
};
template <class INIter, class RAIter, class Compare>
void itersort ( INIter first, INIter last, std::vector < RAIter > & idx, Compare comp )
{
idx.resize ( std::distance ( first, last ) );
for ( typename std::vector < RAIter >::iterator j = idx.begin( ); first != last; ++ j, ++ first )
* j = first;
std::sort ( idx.begin( ), idx.end( ), IterSortComp< RAIter, Compare > ( comp ) );
}
as for the usage example:
std::vector < int > A ( n );
// populate A with some random values
std::generate ( A.begin( ), A.end( ), rand );
std::vector < std::vector < int >::const_iterator > idx;
itersort ( A.begin( ), A.end( ), idx, std::less < int > ( ) );
now, for example, the 5th smallest element in the sorted vector would have value **idx[ 5 ] and its index in the original vector would be distance( A.begin( ), *idx[ 5 ] ) or simply *idx[ 5 ] - A.begin( ).
Consider using std::multimap as suggested by #Ulrich Eckhardt. Just that the code could be made even simpler.
Given
std::vector<int> a = {5, 2, 1, 4, 3}; // a: 5 2 1 4 3
To sort in the mean time of insertion
std::multimap<int, std::size_t> mm;
for (std::size_t i = 0; i != a.size(); ++i)
mm.insert({a[i], i});
To retrieve values and original indices
std::vector<int> b;
std::vector<std::size_t> c;
for (const auto & kv : mm) {
b.push_back(kv.first); // b: 1 2 3 4 5
c.push_back(kv.second); // c: 2 1 4 3 0
}
The reason to prefer a std::multimap to a std::map is to allow equal values in original vectors. Also please note that, unlike for std::map, operator[] is not defined for std::multimap.
There is another way to solve this, using a map:
vector<double> v = {...}; // input data
map<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
m[*it] = it - v.begin();
This will eradicate non-unique elements though. If that's not acceptable, use a multimap:
vector<double> v = {...}; // input data
multimap<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
m.insert(make_pair(*it, it - v.begin()));
In order to output the indices, iterate over the map or multimap:
for (auto it = m.begin(); it != m.end(); ++it)
cout << it->second << endl;
Beautiful solution by #Lukasz Wiklendt! Although in my case I needed something more generic so I modified it a bit:
template <class RAIter, class Compare>
vector<size_t> argSort(RAIter first, RAIter last, Compare comp) {
vector<size_t> idx(last-first);
iota(idx.begin(), idx.end(), 0);
auto idxComp = [&first,comp](size_t i1, size_t i2) {
return comp(first[i1], first[i2]);
};
sort(idx.begin(), idx.end(), idxComp);
return idx;
}
Example: Find indices sorting a vector of strings by length, except for the first element which is a dummy.
vector<string> test = {"dummy", "a", "abc", "ab"};
auto comp = [](const string &a, const string& b) {
return a.length() > b.length();
};
const auto& beginIt = test.begin() + 1;
vector<size_t> ind = argSort(beginIt, test.end(), comp);
for(auto i : ind)
cout << beginIt[i] << endl;
prints:
abc
ab
a
Make a std::pair in function then sort pair :
generic version :
template< class RandomAccessIterator,class Compare >
auto sort2(RandomAccessIterator begin,RandomAccessIterator end,Compare cmp) ->
std::vector<std::pair<std::uint32_t,RandomAccessIterator>>
{
using valueType=typename std::iterator_traits<RandomAccessIterator>::value_type;
using Pair=std::pair<std::uint32_t,RandomAccessIterator>;
std::vector<Pair> index_pair;
index_pair.reserve(std::distance(begin,end));
for(uint32_t idx=0;begin!=end;++begin,++idx){
index_pair.push_back(Pair(idx,begin));
}
std::sort( index_pair.begin(),index_pair.end(),[&](const Pair& lhs,const Pair& rhs){
return cmp(*lhs.second,*rhs.second);
});
return index_pair;
}
ideone
Well, my solution uses residue technique. We can place the values under sorting in the upper 2 bytes and the indices of the elements - in the lower 2 bytes:
int myints[] = {32,71,12,45,26,80,53,33};
for (int i = 0; i < 8; i++)
myints[i] = myints[i]*(1 << 16) + i;
Then sort the array myints as usual:
std::vector<int> myvector(myints, myints+8);
sort(myvector.begin(), myvector.begin()+8, std::less<int>());
After that you can access the elements' indices via residuum. The following code prints the indices of the values sorted in the ascending order:
for (std::vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it)
std::cout << ' ' << (*it)%(1 << 16);
Of course, this technique works only for the relatively small values in the original array myints (i.e. those which can fit into upper 2 bytes of int). But it has additional benefit of distinguishing identical values of myints: their indices will be printed in the right order.
If it's possible, you can build the position array using find function, and then sort the array.
Or maybe you can use a map where the key would be the element, and the values a list of its position in the upcoming arrays (A, B and C)
It depends on later uses of those arrays.
I recently stepped upon the elegant projection feature of C++20 <ranges> and it allows to write shorter/clearer code:
std::vector<std::size_t> B(std::size(A));
std::iota(begin(B), end(B), 0);
std::ranges::sort(B, {}, [&](std::size_t i){ return A[i]; });
{} refers to the usual std::less<std::size_t>. So as you can see we define a function to call on each element before any comparaison. This projection feature is actually quite powerful since this function can be, as here, a lambda or it can even be a method, or a member value. For instance:
struct Item {
float price;
float weight;
float efficiency() const { return price / weight; }
};
int main() {
std::vector<Item> items{{7, 9}, {3, 4}, {5, 3}, {9, 7}};
std::ranges::sort(items, std::greater<>(), &Item::efficiency);
// now items are sorted by their efficiency in decreasing order:
// items = {{5, 3}, {9, 7}, {7, 9}, {3, 4}}
}
If we wanted to sort by increasing price:
std::ranges::sort(items, {}, &Item::price);
Don't define operator< or use lambdas, use a projection!
Are the items in the vector unique? If so, copy the vector, sort one of the copies with STL Sort then you can find which index each item had in the original vector.
If the vector is supposed to handle duplicate items, I think youre better of implementing your own sort routine.
For this type of question
Store the orignal array data into a new data and then binary search the first element of the sorted array into the duplicated array and that indice should be stored into a vector or array.
input array=>a
duplicate array=>b
vector=>c(Stores the indices(position) of the orignal array
Syntax:
for(i=0;i<n;i++)
c.push_back(binarysearch(b,n,a[i]));`
Here binarysearch is a function which takes the array,size of array,searching item and would return the position of the searched item
One solution is to use a 2D vector.
#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<vector<double>> val_and_id;
val_and_id.resize(5);
for (int i = 0; i < 5; i++) {
val_and_id[i].resize(2); // one to store value, the other for index.
}
// Store value in dimension 1, and index in the other:
// say values are 5,4,7,1,3.
val_and_id[0][0] = 5.0;
val_and_id[1][0] = 4.0;
val_and_id[2][0] = 7.0;
val_and_id[3][0] = 1.0;
val_and_id[4][0] = 3.0;
val_and_id[0][1] = 0.0;
val_and_id[1][1] = 1.0;
val_and_id[2][1] = 2.0;
val_and_id[3][1] = 3.0;
val_and_id[4][1] = 4.0;
sort(val_and_id.begin(), val_and_id.end());
// display them:
cout << "Index \t" << "Value \n";
for (int i = 0; i < 5; i++) {
cout << val_and_id[i][1] << "\t" << val_and_id[i][0] << "\n";
}
return 0;
}
Here is the output:
Index Value
3 1
4 3
1 4
0 5
2 7

Erasing() an element in a vector doesn't work

I have a vector. I need to delete the last 3 elements in it.
Described this logic. The program crashes. What could be the mistake?
vector<float>::iterator d = X.end();
for (size_t i = 1; i < 3; i++) {
if (i == 1) X.erase(d);
else X.erase(d - i);
}
If there are at least 3 items in the vector, to delete the last 3 items is simple -- just call pop_back 3 times:
#include <vector>
#include <iostream>
int main()
{
std::vector<float> v = { 1, 2, 3, 4, 5 };
for (int i = 0; i < 3 && !v.empty(); ++i)
v.pop_back();
for ( const auto &item : v ) std::cout << item << ' ';
std::cout << '\n';
}
Output:
1 2
It is undefined behavior to pass the end() iterator to the 1-parameter erase() overload. Even if it weren't, erase() invalidates iterators that are "at and after" the specified element, making d invalid after the 1st loop iteration.
std::vector has a 2-parameter erase() overload that accepts a range of elements to remove. You don't need a manual loop at all:
if (X.size() >= 3)
X.erase(X.end()-3, X.end());
Live Demo
You could use a reverse_iterator:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<float> X = {1.1, 2.2, 3.3, 4.4, 5.5, 6.6};
// start the iterator at the last element
vector<float>::reverse_iterator rit = X.rbegin();
// repeat 3 times
for(size_t i = 0; i < 3; i++)
{
rit++;
X.erase(rit.base());
}
// display all elements in vector X
for(float &e: X)
cout << e << '\n';
return 0;
}
There are few things to mention:
reverse_iterator rit starts at the last element of the vector X. This position is called rbegin.
erase requires classic iterator to work with. We get that from rit by calling base. But that new iterator will point to the next element from rit in forward direction.
That's why we advance the rit before calling base and erase
Also if you want to know more about reverse_iterator, I suggest visiting this answer.
First, X.end() doesn't return an iterator to the last element of the vector, it rather returns an iterator to the element past the last element of the vector, which is an element the vector doesn't actually own, that's why when you try to erase it with X.erase(d) the program crashes.
Instead, provided that the vector contains at least 3 elements, you can do the following:
X.erase( X.end() - 3, X.end() );
Which instead goes to the third last element, and erases every element after that until it gets to X.end().
EDIT: Just to clarify, X.end() is a LegacyRandomAccessIterator which is specified to have a valid - operation which returns another LegacyRandomAccessIterator.
This statement
if (i == 1) X.erase(d);
has undefined behavior.
And this statement tries to remove only the element before the last element
else X.erase(d - i);
because you have a loop with only two iterations
for (size_t i = 1; i < 3; i++) {
You need something like the following.
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
int main()
{
std::vector<float> v = { 1, 2, 3, 4, 5 };
auto n = std::min<decltype( v.size() )>( v.size(), 3 );
if ( n ) v.erase( std::prev( std::end( v ), n ), std::end( v ) );
for ( const auto &item : v ) std::cout << item << ' ';
std::cout << '\n';
return 0;
}
The program output is
1 2
The definition of end() from cppreference is:
Returns an iterator referring to the past-the-end element in the vector container.
and slightly below:
It does not point to any element, and thus shall not be dereferenced.
In other words, the vector has no element that end() points to. By dereferencing that non-element thru the erase() method, you are possibly altering memory that does not belong to the vector. Hence ugly things can happen from there on.
It is the usual C++ convention to describe intervals as [low, high), with the “low” value included in the interval, and the “high” value excluded from the interval.
A comment (now deleted) in the question stated that "there's no - operator for an iterator." However, the following code compiles and works in both MSVC and clang-cl, with the standard set to either C++17 or C++14:
#include <iostream>
#include <vector>
int main()
{
std::vector<float> X{ 1.1f, 2.2f, 3.3f, 4.4f, 5.5f, 6.6f };
for (auto f : X) std::cout << f << ' '; std::cout << std::endl;
std::vector<float>::iterator d = X.end();
X.erase(d - 3, d); // This strongly suggest that there IS a "-" operator for a vector iterator!
for (auto f : X) std::cout << f << ' '; std::cout << std::endl;
return 0;
}
The definition provided for the operator- is as follows (in the <vector> header):
_NODISCARD _Vector_iterator operator-(const difference_type _Off) const {
_Vector_iterator _Tmp = *this;
return _Tmp -= _Off;
}
However, I'm certainly no C++ language-lawyer, and it is possible that this is one of those 'dangerous' Microsoft extensions. I would be very interested to know if this works on other platforms/compilers.

How to implement what I call a "wraparound sort" around an arbitrary value in C++?

Given a vector of ints, I would like to implement what I am calling in my mind a "wraparound sort." Basically, given an arbitrary value, all of the values greater than or equal to that value are listed first in ascending order, and than all values less than the arbitrary value, again in ascending order. For a value of 12, a wraparound sorted array would look like:
[13, 15, 18, 29, 32, 1, 3, 4, 8, 9, 11]
What would be a way to implement such a sort? I am fine assuming as a starting point a vector that is already sorted in ascending order without the wraparound characteristic, since it's easy enough to get to that state, if such an assumption is useful.
You can do this combining std::partition and std::sort. std::partition will split the vector around the partition point and then you can use std::sort to sort both halves. That would look like
int main()
{
std::vector vec = {1,3,4,8,9,11,13,15,18,29,32};
// all elements >= 12 come first, return value is end iterator of that set
auto front_end = std::partition(vec.begin(), vec.end(), [](auto elm){return elm >= 12;});
// sort first half
std::sort(vec.begin(), front_end);
// sort second half
std::sort(front_end, vec.end());
for (auto e : vec)
std::cout << e << " ";
}
which outputs
13 15 18 29 32 1 3 4 8 9 11
From PaulMcKenzie's comment you can reduce the code size a little bit by using std::stable_partition on a sorted vector. That would look like
int main()
{
std::vector vec = {1,3,4,8,9,11,13,15,18,29,32};
// sort the vector
std::sort(vec.begin(), vec.end());
// all elements >= 12 come first in the order they had in the sorted vector
std::stable_partition(vec.begin(), vec.end(), [](auto elm){return elm >= 12;});
for (auto e : vec)
std::cout << e << " ";
}
It should be noted that std::stable_partition does try to do an allocation to be as efficient as std::partition and if it can't do that then it falls back to a less efficient O(NlogN) algorithm.
You can use lower_bound and rotate on the sorted array:
#include <vector>
#include <algorithm>
#include <iostream>
int main()
{
int pivot = 12;
std::vector<int> vec = {1,3,4,8,9,11,13,15,18,29,32,12};
// sort the array
std::sort(vec.begin(), vec.end());
// find the first element >= the pivot
auto it = std::lower_bound(vec.begin(), vec.end(), pivot);
// rotate so that it's in the first position
std::rotate(vec.begin(), it, vec.end());
for (auto v : vec)
std::cout << v << " ";
}
or you can use sort with a custom comparator:
#include <vector>
#include <algorithm>
#include <iostream>
int main()
{
int pivot = 12;
std::vector<int> vec = {1,3,4,8,9,11,13,15,18,29,32,12};
std::sort(vec.begin(), vec.end(), [pivot](int lhs, int rhs) {
if ((lhs <= pivot && rhs <= pivot) || (lhs > pivot && rhs > pivot))
{
// on the same side wrt pivot -> true if lhs is smaller than rhs
return lhs < rhs;
}
// either (lhs <= pivot) && (rhs > pivot)
// or (lhs > pivot) && (rhs <= pivot)
// -> true if rhs is smaller than lhs
return rhs < lhs;
});
for (auto v : vec)
std::cout << v << " ";
}

Keep Indexing when Sorting Vector c++ [duplicate]

Using C++, and hopefully the standard library, I want to sort a sequence of samples in ascending order, but I also want to remember the original indexes of the new samples.
For example, I have a set, or vector, or matrix of samples A : [5, 2, 1, 4, 3]. I want to sort these to be B : [1,2,3,4,5], but I also want to remember the original indexes of the values, so I can get another set which would be:
C : [2, 1, 4, 3, 0 ] - which corresponds to the index of each element in 'B', in the original 'A'.
For example, in Matlab you can do:
[a,b]=sort([5, 8, 7])
a = 5 7 8
b = 1 3 2
Can anyone see a good way to do this?
Using C++ 11 lambdas:
#include <iostream>
#include <vector>
#include <numeric> // std::iota
#include <algorithm> // std::sort, std::stable_sort
using namespace std;
template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) {
// initialize original index locations
vector<size_t> idx(v.size());
iota(idx.begin(), idx.end(), 0);
// sort indexes based on comparing values in v
// using std::stable_sort instead of std::sort
// to avoid unnecessary index re-orderings
// when v contains elements of equal values
stable_sort(idx.begin(), idx.end(),
[&v](size_t i1, size_t i2) {return v[i1] < v[i2];});
return idx;
}
Now you can use the returned index vector in iterations such as
for (auto i: sort_indexes(v)) {
cout << v[i] << endl;
}
You can also choose to supply your original index vector, sort function, comparator, or automatically reorder v in the sort_indexes function using an extra vector.
You could sort std::pair instead of just ints - first int is original data, second int is original index. Then supply a comparator that only sorts on the first int. Example:
Your problem instance: v = [5 7 8]
New problem instance: v_prime = [<5,0>, <8,1>, <7,2>]
Sort the new problem instance using a comparator like:
typedef std::pair<int,int> mypair;
bool comparator ( const mypair& l, const mypair& r)
{ return l.first < r.first; }
// forgetting the syntax here but intent is clear enough
The result of std::sort on v_prime, using that comparator, should be:
v_prime = [<5,0>, <7,2>, <8,1>]
You can peel out the indices by walking the vector, grabbing .second from each std::pair.
Suppose Given vector is
A=[2,4,3]
Create a new vector
V=[0,1,2] // indicating positions
Sort V and while sorting instead of comparing elements of V , compare corresponding elements of A
//Assume A is a given vector with N elements
vector<int> V(N);
std::iota(V.begin(),V.end(),0); //Initializing
sort( V.begin(),V.end(), [&](int i,int j){return A[i]<A[j];} );
vector<pair<int,int> >a;
for (i = 0 ;i < n ; i++) {
// filling the original array
cin >> k;
a.push_back (make_pair (k,i)); // k = value, i = original index
}
sort (a.begin(),a.end());
for (i = 0 ; i < n ; i++){
cout << a[i].first << " " << a[i].second << "\n";
}
Now a contains both both our values and their respective indices in the sorted.
a[i].first = value at i'th.
a[i].second = idx in initial array.
I wrote generic version of index sort.
template <class RAIter, class Compare>
void argsort(RAIter iterBegin, RAIter iterEnd, Compare comp,
std::vector<size_t>& indexes) {
std::vector< std::pair<size_t,RAIter> > pv ;
pv.reserve(iterEnd - iterBegin) ;
RAIter iter ;
size_t k ;
for (iter = iterBegin, k = 0 ; iter != iterEnd ; iter++, k++) {
pv.push_back( std::pair<int,RAIter>(k,iter) ) ;
}
std::sort(pv.begin(), pv.end(),
[&comp](const std::pair<size_t,RAIter>& a, const std::pair<size_t,RAIter>& b) -> bool
{ return comp(*a.second, *b.second) ; }) ;
indexes.resize(pv.size()) ;
std::transform(pv.begin(), pv.end(), indexes.begin(),
[](const std::pair<size_t,RAIter>& a) -> size_t { return a.first ; }) ;
}
Usage is the same as that of std::sort except for an index container to receive sorted indexes.
testing:
int a[] = { 3, 1, 0, 4 } ;
std::vector<size_t> indexes ;
argsort(a, a + sizeof(a) / sizeof(a[0]), std::less<int>(), indexes) ;
for (size_t i : indexes) printf("%d\n", int(i)) ;
you should get 2 1 0 3.
for the compilers without c++0x support, replace the lamba expression as a class template:
template <class RAIter, class Compare>
class PairComp {
public:
Compare comp ;
PairComp(Compare comp_) : comp(comp_) {}
bool operator() (const std::pair<size_t,RAIter>& a,
const std::pair<size_t,RAIter>& b) const { return comp(*a.second, *b.second) ; }
} ;
and rewrite std::sort as
std::sort(pv.begin(), pv.end(), PairComp(comp)()) ;
I came across this question, and figured out sorting the iterators directly would be a way to sort the values and keep track of indices; There is no need to define an extra container of pairs of ( value, index ) which is helpful when the values are large objects; The iterators provides the access to both the value and the index:
/*
* a function object that allows to compare
* the iterators by the value they point to
*/
template < class RAIter, class Compare >
class IterSortComp
{
public:
IterSortComp ( Compare comp ): m_comp ( comp ) { }
inline bool operator( ) ( const RAIter & i, const RAIter & j ) const
{
return m_comp ( * i, * j );
}
private:
const Compare m_comp;
};
template <class INIter, class RAIter, class Compare>
void itersort ( INIter first, INIter last, std::vector < RAIter > & idx, Compare comp )
{
idx.resize ( std::distance ( first, last ) );
for ( typename std::vector < RAIter >::iterator j = idx.begin( ); first != last; ++ j, ++ first )
* j = first;
std::sort ( idx.begin( ), idx.end( ), IterSortComp< RAIter, Compare > ( comp ) );
}
as for the usage example:
std::vector < int > A ( n );
// populate A with some random values
std::generate ( A.begin( ), A.end( ), rand );
std::vector < std::vector < int >::const_iterator > idx;
itersort ( A.begin( ), A.end( ), idx, std::less < int > ( ) );
now, for example, the 5th smallest element in the sorted vector would have value **idx[ 5 ] and its index in the original vector would be distance( A.begin( ), *idx[ 5 ] ) or simply *idx[ 5 ] - A.begin( ).
Consider using std::multimap as suggested by #Ulrich Eckhardt. Just that the code could be made even simpler.
Given
std::vector<int> a = {5, 2, 1, 4, 3}; // a: 5 2 1 4 3
To sort in the mean time of insertion
std::multimap<int, std::size_t> mm;
for (std::size_t i = 0; i != a.size(); ++i)
mm.insert({a[i], i});
To retrieve values and original indices
std::vector<int> b;
std::vector<std::size_t> c;
for (const auto & kv : mm) {
b.push_back(kv.first); // b: 1 2 3 4 5
c.push_back(kv.second); // c: 2 1 4 3 0
}
The reason to prefer a std::multimap to a std::map is to allow equal values in original vectors. Also please note that, unlike for std::map, operator[] is not defined for std::multimap.
There is another way to solve this, using a map:
vector<double> v = {...}; // input data
map<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
m[*it] = it - v.begin();
This will eradicate non-unique elements though. If that's not acceptable, use a multimap:
vector<double> v = {...}; // input data
multimap<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
m.insert(make_pair(*it, it - v.begin()));
In order to output the indices, iterate over the map or multimap:
for (auto it = m.begin(); it != m.end(); ++it)
cout << it->second << endl;
Beautiful solution by #Lukasz Wiklendt! Although in my case I needed something more generic so I modified it a bit:
template <class RAIter, class Compare>
vector<size_t> argSort(RAIter first, RAIter last, Compare comp) {
vector<size_t> idx(last-first);
iota(idx.begin(), idx.end(), 0);
auto idxComp = [&first,comp](size_t i1, size_t i2) {
return comp(first[i1], first[i2]);
};
sort(idx.begin(), idx.end(), idxComp);
return idx;
}
Example: Find indices sorting a vector of strings by length, except for the first element which is a dummy.
vector<string> test = {"dummy", "a", "abc", "ab"};
auto comp = [](const string &a, const string& b) {
return a.length() > b.length();
};
const auto& beginIt = test.begin() + 1;
vector<size_t> ind = argSort(beginIt, test.end(), comp);
for(auto i : ind)
cout << beginIt[i] << endl;
prints:
abc
ab
a
Make a std::pair in function then sort pair :
generic version :
template< class RandomAccessIterator,class Compare >
auto sort2(RandomAccessIterator begin,RandomAccessIterator end,Compare cmp) ->
std::vector<std::pair<std::uint32_t,RandomAccessIterator>>
{
using valueType=typename std::iterator_traits<RandomAccessIterator>::value_type;
using Pair=std::pair<std::uint32_t,RandomAccessIterator>;
std::vector<Pair> index_pair;
index_pair.reserve(std::distance(begin,end));
for(uint32_t idx=0;begin!=end;++begin,++idx){
index_pair.push_back(Pair(idx,begin));
}
std::sort( index_pair.begin(),index_pair.end(),[&](const Pair& lhs,const Pair& rhs){
return cmp(*lhs.second,*rhs.second);
});
return index_pair;
}
ideone
Well, my solution uses residue technique. We can place the values under sorting in the upper 2 bytes and the indices of the elements - in the lower 2 bytes:
int myints[] = {32,71,12,45,26,80,53,33};
for (int i = 0; i < 8; i++)
myints[i] = myints[i]*(1 << 16) + i;
Then sort the array myints as usual:
std::vector<int> myvector(myints, myints+8);
sort(myvector.begin(), myvector.begin()+8, std::less<int>());
After that you can access the elements' indices via residuum. The following code prints the indices of the values sorted in the ascending order:
for (std::vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it)
std::cout << ' ' << (*it)%(1 << 16);
Of course, this technique works only for the relatively small values in the original array myints (i.e. those which can fit into upper 2 bytes of int). But it has additional benefit of distinguishing identical values of myints: their indices will be printed in the right order.
If it's possible, you can build the position array using find function, and then sort the array.
Or maybe you can use a map where the key would be the element, and the values a list of its position in the upcoming arrays (A, B and C)
It depends on later uses of those arrays.
I recently stepped upon the elegant projection feature of C++20 <ranges> and it allows to write shorter/clearer code:
std::vector<std::size_t> B(std::size(A));
std::iota(begin(B), end(B), 0);
std::ranges::sort(B, {}, [&](std::size_t i){ return A[i]; });
{} refers to the usual std::less<std::size_t>. So as you can see we define a function to call on each element before any comparaison. This projection feature is actually quite powerful since this function can be, as here, a lambda or it can even be a method, or a member value. For instance:
struct Item {
float price;
float weight;
float efficiency() const { return price / weight; }
};
int main() {
std::vector<Item> items{{7, 9}, {3, 4}, {5, 3}, {9, 7}};
std::ranges::sort(items, std::greater<>(), &Item::efficiency);
// now items are sorted by their efficiency in decreasing order:
// items = {{5, 3}, {9, 7}, {7, 9}, {3, 4}}
}
If we wanted to sort by increasing price:
std::ranges::sort(items, {}, &Item::price);
Don't define operator< or use lambdas, use a projection!
Are the items in the vector unique? If so, copy the vector, sort one of the copies with STL Sort then you can find which index each item had in the original vector.
If the vector is supposed to handle duplicate items, I think youre better of implementing your own sort routine.
For this type of question
Store the orignal array data into a new data and then binary search the first element of the sorted array into the duplicated array and that indice should be stored into a vector or array.
input array=>a
duplicate array=>b
vector=>c(Stores the indices(position) of the orignal array
Syntax:
for(i=0;i<n;i++)
c.push_back(binarysearch(b,n,a[i]));`
Here binarysearch is a function which takes the array,size of array,searching item and would return the position of the searched item
One solution is to use a 2D vector.
#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<vector<double>> val_and_id;
val_and_id.resize(5);
for (int i = 0; i < 5; i++) {
val_and_id[i].resize(2); // one to store value, the other for index.
}
// Store value in dimension 1, and index in the other:
// say values are 5,4,7,1,3.
val_and_id[0][0] = 5.0;
val_and_id[1][0] = 4.0;
val_and_id[2][0] = 7.0;
val_and_id[3][0] = 1.0;
val_and_id[4][0] = 3.0;
val_and_id[0][1] = 0.0;
val_and_id[1][1] = 1.0;
val_and_id[2][1] = 2.0;
val_and_id[3][1] = 3.0;
val_and_id[4][1] = 4.0;
sort(val_and_id.begin(), val_and_id.end());
// display them:
cout << "Index \t" << "Value \n";
for (int i = 0; i < 5; i++) {
cout << val_and_id[i][1] << "\t" << val_and_id[i][0] << "\n";
}
return 0;
}
Here is the output:
Index Value
3 1
4 3
1 4
0 5
2 7

Finding Frequency of numbers in a given group of numbers

Suppose we have a vector/array in C++ and we wish to count which of these N elements has maximum repetitive occurrences and output the highest count. Which algorithm is best suited for this job.
example:
int a = { 2, 456, 34, 3456, 2, 435, 2, 456, 2}
the output is 4 because 2 occurs 4 times. That is the maximum number of times 2 occurs.
Sort the array and then do a quick pass to count each number. The algorithm has O(N*logN) complexity.
Alternatively, create a hash table, using the number as the key. Store in the hashtable a counter for each element you've keyed. You'll be able to count all elements in one pass; however, the complexity of the algorithm now depends on the complexity of your hasing function.
Optimized for space:
Quicksort (for example) then iterate over the items, keeping track of largest count only.
At best O(N log N).
Optimized for speed:
Iterate over all elements, keeping track of the separate counts.
This algorithm will always be O(n).
If you have the RAM and your values are not too large, use counting sort.
A possible C++ implementation that makes use of STL could be:
#include <iostream>
#include <algorithm>
#include <map>
// functor
struct maxoccur
{
int _M_val;
int _M_rep;
maxoccur()
: _M_val(0),
_M_rep(0)
{}
void operator()(const std::pair<int,int> &e)
{
std::cout << "pair: " << e.first << " " << e.second << std::endl;
if ( _M_rep < e.second ) {
_M_val = e.first;
_M_rep = e.second;
}
}
};
int
main(int argc, char *argv[])
{
int a[] = {2,456,34,3456,2,435,2,456,2};
std::map<int,int> m;
// load the map
for(unsigned int i=0; i< sizeof(a)/sizeof(a[0]); i++)
m [a[i]]++;
// find the max occurence...
maxoccur ret = std::for_each(m.begin(), m.end(), maxoccur());
std::cout << "value:" << ret._M_val << " max repetition:" << ret._M_rep << std::endl;
return 0;
}
a bit of pseudo-code:
//split string into array firts
strsplit(numbers) //PHP function name to split a string into it's components
i=0
while( i < count(array))
{
if(isset(list[array[i]]))
{
list[array[i]]['count'] = list + 1
}
else
{
list[i]['count'] = 1
list[i]['number']
}
i=i+1
}
usort(list) //usort is a php function that sorts an array by its value not its key, Im assuming that you have something in c++ that does this
print list[0]['number'] //Should contain the most used number
The hash algorithm (build count[i] = #occurrences(i) in basically linear time) is very practical, but is theoretically not strictly O(n) because there could be hash collisions during the process.
An interesting special case of this question is the majority algorithm, where you want to find an element which is present in at least n/2 of the array entries, if any such element exists.
Here is a quick explanation, and a more detailed explanation of how to do this in linear time, without any sort of hash trickiness.
If the range of elements is large compared with the number of elements, I would, as others have said, just sort and scan. This is time n*log n and no additional space (maybe log n additional).
THe problem with the counting sort is that, if the range of values is large, it can take more time to initialize the count array than to sort.
Here's my complete, tested, version, using a std::tr1::unordered_map.
I make this approximately O(n). Firstly it iterates through the n input values to insert/update the counts in the unordered_map, then it does a partial_sort_copy which is O(n). 2*O(n) ~= O(n).
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <iostream>
namespace {
// Only used in most_frequent but can't be a local class because of the member template
struct second_greater {
// Need to compare two (slightly) different types of pairs
template <typename PairA, typename PairB>
bool operator() (const PairA& a, const PairB& b) const
{ return a.second > b.second; }
};
}
template <typename Iter>
std::pair<typename std::iterator_traits<Iter>::value_type, unsigned int>
most_frequent(Iter begin, Iter end)
{
typedef typename std::iterator_traits<Iter>::value_type value_type;
typedef std::pair<value_type, unsigned int> result_type;
std::tr1::unordered_map<value_type, unsigned int> counts;
for(; begin != end; ++begin)
// This is safe because new entries in the map are defined to be initialized to 0 for
// built-in numeric types - no need to initialize them first
++ counts[*begin];
// Only need the top one at this point (could easily expand to top-n)
std::vector<result_type> top(1);
std::partial_sort_copy(counts.begin(), counts.end(),
top.begin(), top.end(), second_greater());
return top.front();
}
int main(int argc, char* argv[])
{
int a[] = { 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
std::pair<int, unsigned int> m = most_frequent(a, a + (sizeof(a) / sizeof(a[0])));
std::cout << "most common = " << m.first << " (" << m.second << " instances)" << std::endl;
assert(m.first == 2);
assert(m.second == 4);
return 0;
}
It wil be in O(n)............ but the thing is the large no. of array can take another array with same size............
for(i=0;i
mar=count[o];
index=o;
for(i=0;i
then the output will be......... the element index is occured for max no. of times in this array........
here a[] is the data array where we need to search the max occurance of certain no. in an array.......
count[] having the count of each element..........
Note : we alrdy knw the range of datas will be in array..
say for eg. the datas in that array ranges from 1 to 100....... then have the count array of 100 elements to keep track, if its occured increament the indexed value by one........
Now, in the year 2022 we have
namespace aliases
more modern containers like std::unordered_map
CTAD (Class Template Argument Deduction)
range based for loops
using statment
the std::ranges library
more modern algorithms
projections
structured bindings
With that we can now write:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
namespace rng = std::ranges;
int main() {
// Demo data
std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
// Count values
using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;
Counter counter{}; for (const auto& d : data) counter[d]++;
// Get max
const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);
// Show output
std::cout << '\n' << value << " found " << count << " times\n";
}