print and count the number of permutation (without using stl next_permutation) - c++

I'm a C programmer and trying to get better at C++. I want to implement a permutation function (without using the STL algorithms). I came up with the following algorithm (out of my C way of thinking), but
a) it crashes for k > 2 (I suppose because the element that the iterator
points to, gets deleted, is inserted back and then incremented).
b) erase/insert operation seem unnecessary.
How would the C++ experts amongst you implement it?
template <class T>
class Ordering {
public:
Ordering(int n);
int combination(int k);
int permutation(int k);
private:
set<T> elements;
vector<T> order;
}
template <class T>
int Ordering<T>::permutation (int k) {
if (k > elements.size()) {
return 0;
}
if (k == 0) {
printOrder();
return 1;
}
int count = 0;
for (typename set<T>::iterator it = elements.begin();
it != elements.end();
it++
)
{
order[k-1] = *it;
elements.erase(*it);
count += permutation(k-1);
elements.insert(*it);
}
return count;
}

The problem is in your iteration over the elements set. You try to increment an iterator which you have removed. That cannot work.
If you insist in using this approach, you must store the successor of it, before calling set::erase. That means you have to move the incrementation part of your for loop into the loop.
Like this:
for (typename set<T>::iterator it = elements.begin();
it != elements.end();
/* nothing here */
)
{
order[k-1] = *it;
typename set<T>::iterator next = it;
++next;
elements.erase(*it);
count += permutation(k-1);
elements.insert(order[k-1]);
it = next;
}
Edit: One possible way of temporarily "removing" objects from your set would be to have a std::set<std::pair<T,bool>> and simply write it->second = false and afterwards it->second = true. Then, while iterating, you can skip entries where the second value is false. This adds a bit of an overhead since you have to do a lot more work while descending. But inserting+removing elements adds a logarithmic overhead every time, which is probably worse.
If you used a (custom) linked list (perhaps you can even get std::list to do that) you could very inexpensively remove and re-insert objects.

Related

faster erase-remove idiom when I don't care about order and don't have duplicates?

I have a vector of objects and want to delete by value. However the value only occurs once if at all, and I don't care about sorting.
Obviously, if such delete-by-values were extremely common, and/or the data set quite big, a vector wouldn't be the best data structure. But let's say I've determined that not to be the case.
To be clear, if my code were C, I'd be happy with the following:
void delete_by_value( int* const piArray, int& n, int iValue ) {
for ( int i = 0; i < n; i++ ) {
if ( piArray[ i ] == iValue ) {
piArray[ i ] = piArray[ --n ];
return;
}
}
}
It seems that the "modern idiom" approach using std::algos and container methods would be:
v.erase(std::remove(v.begin(), v.end(), iValue), v.end());
But that should be far slower since for a random existent element, it's n/2 moves and n compares. My version is 1 move and n/2 compares.
Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom? And if not why not?
Use std::find to replace the loop. Take the replacement value from the predecessor of the end iterator, and also use that iterator to erase that element. As this iterator is to the last element, erase is cheap. Bonus: bool return for success checking and templateing over int.
template<typename T>
bool delete_by_value(std::vector<T> &v, T const &del) {
auto final = v.end();
auto found = std::find(v.begin(), final, del);
if(found == final) return false;
*found = *--final;
v.erase(final);
return true;
}
Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom?
There aren't a ready-made function for every niche use case in the standard library. Unstable remove is one of the functions that is not provided. It has been proposed (p0041r0) a while back though. Likewise, there are also no special versions of algorithms for the special case of vectors that do not contain duplicates.
So, you'll need to implement the algorithm yourself if you wish to use an optimal algorithm. There is std::find for linear search. After that, you only need to assign from last element and finally pop it off.
Most implementations of std::vector::resize will not reallocate if you make the size of the vector smaller. So, the following will probably have similar performance to the C example.
void find_and_delete(std::vector<int>& v, int value) {
auto it = std::find(v.begin(), v.end(), value);
if (it != v.end()) {
*it = v.back();
v.resize(v.size() - 1);
}
}
C++ way would be mostly identical with std::vector:
template <typename T>
void delete_by_value(std::vector<T>& v, const T& value) {
auto it = std::find(v.begin(), v.end(), value);
if (it != v.end()) {
*it = std::move(v.back());
v.pop_back();
}
}

Implementing stable_partition for forward_list

I want to implement something similar to std::stable_partition but for forward_list of c++11.
The stl version requires bidirectional iterators, however by utilizing container specific methods I believe I can get the same outcome effeciently.
Example declaration :
template <typename T, typename UnaryPredicate>
void stable_partition(std::forward_list<T>& list, UnaryPredicate p);
(while possible to add begin and end iterators, I omitted them for brevity. The same for returning the partition point )
I already worked out the algorithm to accomplish this on my own list type, but I have troubles implementing it in stl.
The key method appears to be splice_after. Other methods require memory allocations and copying elements.
Algorithm sketch :
create a new empty list. It will hold all elements p returns true on.
loop over the target list, add items to the true list in accordance to invoking p.
concat the true list to the beginning of the target list.
With proper coding this should be linear time (all operations inside the loop can be done in constant time) and without extra memory allocation or copying.
I am trying to implement the second step using splice_after, but I end up either concating the wrong element or invalidating my iterators.
The question:
What is the correct use of splice_after, so that I avoid
mixing iterators between lists and insert the correct elements?
First Attempt (how I hoped it works):
template <typename T, typename UnaryPredicate>
void stable_partition(std::forward_list<T>& list, UnaryPredicate p)
{
std::forward_list<T> positives;
auto positives_iter = positives.before_begin();
for (auto iter = list.begin(); iter != list.end(); ++iter)
{
if (p(*iter))
positives.splice_after(positives_iter, list, iter);
}
list.splice_after(list.before_begin(), positives);
}
Unfortunately this has at least one major flaw: splice_after inserts after iter, and the wrong element is inserted.
Also, when the element is moved to the other list, incrementing iter now traverses the wrong list.
Having to maintain the preceding iterators for std::forward_list::splice_after makes it a bit trickier, but still pretty short:
template<class T, class UnaryPredicate>
std::array<std::forward_list<T>, 2>
stable_partition(std::forward_list<T>& list, UnaryPredicate p) {
std::array<std::forward_list<T>, 2> r;
decltype(r[0].before_begin()) pos[2] = {r[0].before_begin(), r[1].before_begin()};
for(auto i = list.before_begin(), ni = i, e = list.end(); ++ni != e; ni = i) {
bool idx = p(*ni);
auto& p = pos[idx];
r[idx].splice_after(p, list, i);
++p;
}
return r;
}
Usage example:
template<class T>
void print(std::forward_list<T> const& list) {
for(auto const& e : list)
std::cout << e << ' ';
std::cout << '\n';
}
int main() {
std::forward_list<int> l{0,1,2,3,4,5,6};
print(l);
// Partition into even and odd elements.
auto p = stable_partition(l, [](auto e) { return e % 2; });
print(p[0]); // Even elements.
print(p[1]); // Odd elements.
}

`std::list<>::sort()` - why the sudden switch to top-down strategy?

I remember that since the beginning of times the most popular approach to implementing std::list<>::sort() was the classic Merge Sort algorithm implemented in bottom-up fashion (see also What makes the gcc std::list sort implementation so fast?).
I remember seeing someone aptly refer to this strategy as "onion chaining" approach.
At least that's the way it is in GCC's implementation of C++ standard library (see, for example, here). And this is how it was in old Dimkumware's STL in MSVC version of standard library, as well as in all versions of MSVC all the way to VS2013.
However, the standard library supplied with VS2015 suddenly no longer follows this sorting strategy. The library shipped with VS2015 uses a rather straightforward recursive implementation of top-down Merge Sort. This strikes me as strange, since top-down approach requires access to the mid-point of the list in order to split it in half. Since std::list<> does not support random access, the only way to find that mid-point is to literally iterate through half of the list. Also, at the very beginning it is necessary to know the total number of elements in the list (which was not necessarily an O(1) operation before C++11).
Nevertheless, std::list<>::sort() in VS2015 does exactly that. Here's an excerpt from that implementation that locates the mid-point and performs recursive calls
...
iterator _Mid = _STD next(_First, _Size / 2);
_First = _Sort(_First, _Mid, _Pred, _Size / 2);
_Mid = _Sort(_Mid, _Last, _Pred, _Size - _Size / 2);
...
As you can see, they just nonchalantly use std::next to walk through the first half of the list and arrive at _Mid iterator.
What could be the reason behind this switch, I wonder? All I see is a seemingly obvious inefficiency of repetitive calls to std::next at each level of recursion. Naive logic says that this is slower. If they are willing to pay this kind of price, they probably expect to get something in return. What are they getting then? I don't immediately see this algorithm as having better cache behavior (compared to the original bottom-up approach). I don't immediately see it as behaving better on pre-sorted sequences.
Granted, since C++11 std::list<> is basically required to store its element count, which makes the above slightly more efficient, since we always know the element count in advance. But that still does not seem to be enough to justify the sequential scan on each level of recursion.
(Admittedly, I haven't tried to race the implementations against each other. Maybe there are some surprises there.)
Note this answer has been updated to address all of the issues mentioned in the comments below and after the question, by making the same change from an array of lists to an array of iterators, while retaining the faster bottom up merge sort algorithm, and eliminating the small chance of stack overflow due to recursion with the top down merge sort algorithm.
Initially I assumed that Microsoft would not have switched to a less efficient top down merge sort when it switched to using iterators unless it was necessary, so I was looking for alternatives. It was only when I tried to analyze the issues (out of curiosity) that I realized that the original bottom up merge sort could be modified to work with iterators.
In #sbi's comment, he asked the author of the top down approach, Stephan T. Lavavej, why the change to iterators was made. Stephan's response was "to avoid memory allocation and default constructing allocators". VS2015 introduced non-default-constructible and stateful allocators, which presents an issue when using the prior version's array of lists, as each instance of a list allocates a dummy node, and a change would be needed to handle no default allocator.
Lavavej's solution was to switch to using iterators to keep track of run boundaries within the original list instead of an internal array of lists. The merge logic was changed to use 3 iterator parameters, 1st parameter is iterator to start of left run, 2nd parameter is iterator to end of left run == iterator to start of right run, 3rd parameter is iterator to end of right run. The merge process uses std::list::splice to move nodes within the original list during merge operations. This has the added benefit of being exception safe. If a caller's compare function throws an exception, the list will be re-ordered, but no loss of data will occur (assuming splice can't fail). With the prior scheme, some (or most) of the data would be in the internal array of lists if an exception occurred, and data would be lost from the original list.
I changed bottom up merge sort to use an array of iterators instead of an array of lists, where array[i] is an iterator to the start of a sorted run with 2^i nodes, or it is empty (using std::list::end to indicate empty, since iterators can't be null). Similar to the top down approach, the array of iterators is only used to keep track of sorted run boundaries within the original linked list, with the same merge logic as top down that uses std::list::splice to move nodes within the original linked list.
A single scan of the list is done, building up sorted runs to the left of the current scan.next position according to the sorted run boundaries in the array, until all nodes are merged into the sorted runs. Then the sorted runs are merged, resulting in a sorted list.
For example, for a list with 7 nodes, after the scan:
2 1 0 array index
run0->run0->run0->run0->run1->run1->run2->end
Then the 3 sorted runs are merged right to left via merge(left, right), so that the sort is stable.
If a linked list is large and the nodes are scattered, there will be a lot of cache misses, and top down will be about 40% to 50% slower than bottom up depending on the processor. Then again, if there's enough memory, it would usually be faster to move the list to an array or vector, sort the array or vector, then create a new list from the sorted array or vector.
Example C++ code:
template <typename T>
typename std::list<T>::iterator Merge(std::list<T> &ll,
typename std::list<T>::iterator li,
typename std::list<T>::iterator ri,
typename std::list<T>::iterator ei);
// iterator array size
#define ASZ 32
template <typename T>
void SortList(std::list<T> &ll)
{
if (ll.size() < 2) // return if nothing to do
return;
typename std::list<T>::iterator ai[ASZ]; // array of iterator (bgn lft)
typename std::list<T>::iterator ri; // right iterator (end lft, bgn rgt)
typename std::list<T>::iterator ei; // end iterator (end rgt)
size_t i;
for (i = 0; i < ASZ; i++) // "clear" array
ai[i] = ll.end();
// merge nodes into array of runs
for (ei = ll.begin(); ei != ll.end();) {
ri = ei++;
for (i = 0; (i < ASZ) && ai[i] != ll.end(); i++) {
ri = Merge(ll, ai[i], ri, ei);
ai[i] = ll.end();
}
if(i == ASZ)
i--;
ai[i] = ri;
}
// merge array of runs into single sorted list
// ei = ll.end();
for(i = 0; (i < ASZ) && ai[i] == ei; i++);
ri = ai[i++];
while(1){
for( ; (i < ASZ) && ai[i] == ei; i++);
if (i == ASZ)
break;
ri = Merge(ll, ai[i++], ri, ei);
}
}
template <typename T>
typename std::list<T>::iterator Merge(std::list<T> &ll,
typename std::list<T>::iterator li,
typename std::list<T>::iterator ri,
typename std::list<T>::iterator ei)
{
typename std::list<T>::iterator ni;
(*ri < *li) ? ni = ri : ni = li;
while(1){
if(*ri < *li){
ll.splice(li, ll, ri++);
if(ri == ei)
return ni;
} else {
if(++li == ri)
return ni;
}
}
}
Example replacement code for VS2019's std::list::sort(), in include file list. The merge logic was made into a separate internal function, since it's now used in two places. The call to _Sort from std::list::sort() is _Sort(begin(), end(), _Pred, this->_Mysize());, where _Pred is a pointer to the compare function (defaults to std::less()).
private:
template <class _Pr2>
iterator _Merge(_Pr2 _Pred, iterator _First, iterator _Mid, iterator _Last){
iterator _Newfirst = _First;
for (bool _Initial_loop = true;;
_Initial_loop = false) { // [_First, _Mid) and [_Mid, _Last) are sorted and non-empty
if (_DEBUG_LT_PRED(_Pred, *_Mid, *_First)) { // consume _Mid
if (_Initial_loop) {
_Newfirst = _Mid; // update return value
}
splice(_First, *this, _Mid++);
if (_Mid == _Last) {
return _Newfirst; // exhausted [_Mid, _Last); done
}
}
else { // consume _First
++_First;
if (_First == _Mid) {
return _Newfirst; // exhausted [_First, _Mid); done
}
}
}
}
template <class _Pr2>
void _Sort(iterator _First, iterator _Last, _Pr2 _Pred,
size_type _Size) { // order [_First, _Last), using _Pred, return new first
// _Size must be distance from _First to _Last
if (_Size < 2) {
return; // nothing to do
}
const size_t _ASZ = 32; // array size
iterator _Ai[_ASZ]; // array of iterator to run (bgn lft)
iterator _Mi; // middle iterator to run (end lft, bgn rgt)
iterator _Li; // last (end) iterator to run (end rgt)
size_t _I; // index to _Ai
for (_I = 0; _I < _ASZ; _I++) // "empty" array
_Ai[_I] = _Last; // _Ai[] == _Last => empty entry
// merge nodes into array of runs
for (_Li = _First; _Li != _Last;) {
_Mi = _Li++;
for (_I = 0; (_I < _ASZ) && _Ai[_I] != _Last; _I++) {
_Mi = _Merge(_Pass_fn(_Pred), _Ai[_I], _Mi, _Li);
_Ai[_I] = _Last;
}
if (_I == _ASZ)
_I--;
_Ai[_I] = _Mi;
}
// merge array of runs into single sorted list
for (_I = 0; _I < _ASZ && _Ai[_I] == _Last; _I++);
_Mi = _Ai[_I++];
while (1) {
for (; _I < _ASZ && _Ai[_I] == _Last; _I++);
if (_I == _ASZ)
break;
_Mi = _Merge(_Pass_fn(_Pred), _Ai[_I++], _Mi, _Last);
}
}
The remainder of this answer is historical, and only left for the historical comments, otherwise it is no longer relevant.
I was able to reproduce the issue (old sort fails to compile, new one works) based on a demo from #IgorTandetnik:
#include <iostream>
#include <list>
#include <memory>
template <typename T>
class MyAlloc : public std::allocator<T> {
public:
MyAlloc(T) {} // suppress default constructor
template <typename U>
MyAlloc(const MyAlloc<U>& other) : std::allocator<T>(other) {}
template< class U > struct rebind { typedef MyAlloc<U> other; };
};
int main()
{
std::list<int, MyAlloc<int>> l(MyAlloc<int>(0));
l.push_back(3);
l.push_back(0);
l.push_back(2);
l.push_back(1);
l.sort();
return 0;
}
I noticed this change back in July, 2016 and emailed P.J. Plauger about this change on August 1, 2016. A snippet of his reply:
Interestingly enough, our change log doesn't reflect this change. That
probably means it was "suggested" by one of our larger customers and
got by me on the code review. All I know now is that the change came
in around the autumn of 2015. When I reviewed the code, the first
thing that struck me was the line:
iterator _Mid = _STD next(_First, _Size / 2);
which, of course, can take a very long time for a large list.
The code looks a bit more elegant than what I wrote in early 1995(!),
but definitely has worse time complexity. That version was modeled
after the approach by Stepanov, Lee, and Musser in the original STL.
They are seldom found to be wrong in their choice of algorithms.
I'm now reverting to our latest known good version of the original code.
I don't know if P.J. Plauger's reversion to the original code dealt with the new allocator issue, or if or how Microsoft interacts with Dinkumware.
For a comparison of the top down versus bottom up methods, I created a linked list with 4 million elements, each consisting of one 64 bit unsigned integer, assuming I would end up with a doubly linked list of nearly sequentially ordered nodes (even though they would be dynamically allocated), filled them with random numbers, then sorted them. The nodes don't move, only the linkage is changed, but now traversing the list accesses the nodes in random order. I then filled those randomly ordered nodes with another set of random numbers and sorted them again. I compared the 2015 top down approach with the prior bottom up approach modified to match the other changes made for 2015 (sort() now calls sort() with a predicate compare function, rather than having two separate functions). These are the results. update - I added a node pointer based version and also noted the time for simply creating a vector from list, sorting vector, copy back.
sequential nodes: 2015 version 1.6 seconds, prior version 1.5 seconds
random nodes: 2015 version 4.0 seconds, prior version 2.8 seconds
random nodes: node pointer based version 2.6 seconds
random nodes: create vector from list, sort, copy back 1.25 seconds
For sequential nodes, the prior version is only a bit faster, but for random nodes, the prior version is 30% faster, and the node pointer version 35% faster, and creating a vector from the list, sorting the vector, then copying back is 69% faster.
Below is the first replacement code for std::list::sort() I used to compare the prior bottom up with small array (_BinList[]) method versus VS2015's top down approach I wanted the comparison to be fair, so I modified a copy of < list >.
void sort()
{ // order sequence, using operator<
sort(less<>());
}
template<class _Pr2>
void sort(_Pr2 _Pred)
{ // order sequence, using _Pred
if (2 > this->_Mysize())
return;
const size_t _MAXBINS = 25;
_Myt _Templist, _Binlist[_MAXBINS];
while (!empty())
{
// _Templist = next element
_Templist._Splice_same(_Templist.begin(), *this, begin(),
++begin(), 1);
// merge with array of ever larger bins
size_t _Bin;
for (_Bin = 0; _Bin < _MAXBINS && !_Binlist[_Bin].empty();
++_Bin)
_Templist.merge(_Binlist[_Bin], _Pred);
// don't go past end of array
if (_Bin == _MAXBINS)
_Bin--;
// update bin with merged list, empty _Templist
_Binlist[_Bin].swap(_Templist);
}
// merge bins back into caller's list
for (size_t _Bin = 0; _Bin < _MAXBINS; _Bin++)
if(!_Binlist[_Bin].empty())
this->merge(_Binlist[_Bin], _Pred);
}
I made some minor changes. The original code kept track of the actual maximum bin in a variable named _Maxbin, but the overhead in the final merge is small enough that I removed the code associated with _Maxbin. During the array build, the original code's inner loop merged into a _Binlist[] element, followed by a swap into _Templist, which seemed pointless. I changed the inner loop to just merge into _Templist, only swapping once an empty _Binlist[] element is found.
Below is a node pointer based replacement for std::list::sort() I used for yet another comparison. This eliminates allocation related issues. If a compare exception is possible and occurred, all the nodes in the array and temp list (pNode) would have to be appended back to the original list, or possibly a compare exception could be treated as a less than compare.
void sort()
{ // order sequence, using operator<
sort(less<>());
}
template<class _Pr2>
void sort(_Pr2 _Pred)
{ // order sequence, using _Pred
const size_t _NUMBINS = 25;
_Nodeptr aList[_NUMBINS]; // array of lists
_Nodeptr pNode;
_Nodeptr pNext;
_Nodeptr pPrev;
if (this->size() < 2) // return if nothing to do
return;
this->_Myhead()->_Prev->_Next = 0; // set last node ->_Next = 0
pNode = this->_Myhead()->_Next; // set ptr to start of list
size_t i;
for (i = 0; i < _NUMBINS; i++) // zero array
aList[i] = 0;
while (pNode != 0) // merge nodes into array
{
pNext = pNode->_Next;
pNode->_Next = 0;
for (i = 0; (i < _NUMBINS) && (aList[i] != 0); i++)
{
pNode = _MergeN(_Pred, aList[i], pNode);
aList[i] = 0;
}
if (i == _NUMBINS)
i--;
aList[i] = pNode;
pNode = pNext;
}
pNode = 0; // merge array into one list
for (i = 0; i < _NUMBINS; i++)
pNode = _MergeN(_Pred, aList[i], pNode);
this->_Myhead()->_Next = pNode; // update sentinel node links
pPrev = this->_Myhead(); // and _Prev pointers
while (pNode)
{
pNode->_Prev = pPrev;
pPrev = pNode;
pNode = pNode->_Next;
}
pPrev->_Next = this->_Myhead();
this->_Myhead()->_Prev = pPrev;
}
template<class _Pr2>
_Nodeptr _MergeN(_Pr2 &_Pred, _Nodeptr pSrc1, _Nodeptr pSrc2)
{
_Nodeptr pDst = 0; // destination head ptr
_Nodeptr *ppDst = &pDst; // ptr to head or prev->_Next
if (pSrc1 == 0)
return pSrc2;
if (pSrc2 == 0)
return pSrc1;
while (1)
{
if (_DEBUG_LT_PRED(_Pred, pSrc2->_Myval, pSrc1->_Myval))
{
*ppDst = pSrc2;
pSrc2 = *(ppDst = &pSrc2->_Next);
if (pSrc2 == 0)
{
*ppDst = pSrc1;
break;
}
}
else
{
*ppDst = pSrc1;
pSrc1 = *(ppDst = &pSrc1->_Next);
if (pSrc1 == 0)
{
*ppDst = pSrc2;
break;
}
}
}
return pDst;
}
#sbi asked Stephan T. Lavavej, MSVC's standard library maintainer, who responded:
I did that to avoid memory allocation and default constructing
allocators.
To this I'll add "free basic exception safety".
To elaborate: the pre-VS2015 implementation suffers from several defects:
_Myt _Templist, _Binlist[_MAXBINS]; creates a bunch of intermediate lists (_Myt is simply a typedef for the current instantiation of list; a less confusing spelling for that is, well, list) to hold the nodes during sorting, but these lists are default constructed, which leads to a multitude of problems:
If the allocator used is not default constructible (and there is no requirement that allocators be default constructible), this simply won't compile, because the default constructor of list will attempt to default construct its allocator.
If the allocator used is stateful, then a default-constructed allocator may not compare equal to this->get_allocator(), which means that the later splices and merges are technically undefined behavior and may well break in debug builds. ("Technically", because the nodes are all merged back in the end, so you don't actually deallocate with the wrong allocator if the function successfully completes.)
Dinkumware's list uses a dynamically allocated sentinel node, which means that the above will perform _MAXBINS + 1 dynamic allocations. I doubt that many people expect sort to potentially throw bad_alloc. If the allocator is stateful, then these sentinel nodes may not be even allocated from the right place (see #2).
The code is not exception safe. In particular, the comparison is allowed to throw, and if it throws while there are elements in the intermediate lists, those elements are simply destroyed with the lists during stack unwinding. Users of sort don't expect the list to be sorted if sort throws an exception, of course, but they probably also don't expect the elements to go missing.
This interacts very poorly with #2 above, because now it's not just technical undefined behavior: the destructor of those intermediate lists will be deallocating and destroying the nodes spliced into them with the wrong allocator.
Are those defects fixable? Probably. #1 and #2 can be fixed by passing get_allocator() to the constructor of the lists:
_Myt _Templist(get_allocator());
_Myt _Binlist[_MAXBINS] = { _Myt(get_allocator()), _Myt(get_allocator()),
_Myt(get_allocator()), /* ... repeat _MAXBINS times */ };
The exception safety problem can be fixed by surrounding the loop with a try-catch that splices all the nodes in the intermediate lists back into *this without regard to order if an exception is thrown.
Fixing #3 is harder, because that means not using list at all as the holder of nodes, which probably requires a decent amount of refactoring, but it's doable.
The question is: is it worth jumping through all these hoops to improve the performance of a container that has reduced performance by design? After all, someone who really cares about performance probably won't be using list in the first place.

Sort when only equality is available

Suppose we have a vector of pairs:
std::vector<std::pair<A,B>> v;
where for type A only equality is defined:
bool operator==(A const & lhs, A const & rhs) { ... }
How would you sort it that all pairs with the same first element will end up close? To be clear, the output I hope to achieve should be the same as does something like this:
std::unordered_multimap<A,B> m(v.begin(),v.end());
std::copy(m.begin(),m.end(),v.begin());
However I would like, if possible, to:
Do the sorting in place.
Avoid the need to define a hash function for equality.
Edit: additional concrete information.
In my case the number of elements isn't particularly big (I expect N = 10~1000), though I have to repeat this sorting many times ( ~400) as part of a bigger algorithm, and the datatype known as A is pretty big (it contains among other things an unordered_map with ~20 std::pair<uint32_t,uint32_t> in it, which is the structure preventing me to invent an ordering, and making it hard to build a hash function)
First option: cluster() and sort_within()
The handwritten double loop by #MadScienceDreams can be written as a cluster() algorithm of O(N * K) complexity with N elements and K clusters. It repeatedly calls std::partition (using C++14 style with generic lambdas, easily adaptable to C++1, or even C++98 style by writing your own function objects):
template<class FwdIt, class Equal = std::equal_to<>>
void cluster(FwdIt first, FwdIt last, Equal eq = Equal{})
{
for (auto it = first; it != last; /* increment inside loop */)
it = std::partition(it, last, [=](auto const& elem){
return eq(elem, *it);
});
}
which you call on your input vector<std::pair> as
cluster(begin(v), end(v), [](auto const& L, auto const& R){
return L.first == R.first;
});
The next algorithm to write is sort_within which takes two predicates: an equality and a comparison function object, and repeatedly calls std::find_if_not to find the end of the current range, followed by std::sort to sort within that range:
template<class RndIt, class Equal = std::equal_to<>, class Compare = std::less<>>
void sort_within(RndIt first, RndIt last, Equal eq = Equal{}, Compare cmp = Compare{})
{
for (auto it = first; it != last; /* increment inside loop */) {
auto next = std::find_if_not(it, last, [=](auto const& elem){
return eq(elem, *it);
});
std::sort(it, next, cmp);
it = next;
}
}
On an already clustered input, you can call it as:
sort_within(begin(v), end(v),
[](auto const& L, auto const& R){ return L.first == R.first; },
[](auto const& L, auto const& R){ return L.second < R.second; }
);
Live Example that shows it for some real data using std::pair<int, int>.
Second option: user-defined comparison
Even if there is no operator< defined on A, you might define it yourself. Here, there are two broad options. First, if A is hashable, you can define
bool operator<(A const& L, A const& R)
{
return std::hash<A>()(L) < std::hash<A>()(R);
}
and write std::sort(begin(v), end(v)) directly. You will have O(N log N) calls to std::hash if you don't want to cache all the unique hash values in a separate storage.
Second, if A is not hashable, but does have data member getters x(), y() and z(), that uniquely determine equality on A: you can do
bool operator<(A const& L, A const& R)
{
return std::tie(L.x(), L.y(), L.z()) < std::tie(R.x(), R.y(), R.z());
}
Again you can write std::sort(begin(v), end(v)) directly.
if you can come up with a function that assigns to each unique element a unique number, then you can build secondary array with this unique numbers and then sort secondary array and with it primary for example by merge sort.
But in this case you need function that assigns to each unique element a unique number i.e. hash-function without collisions. I think this should not be a problem.
And asymptotic of this solution if hash-function have O(1), then building secondary array is O(N) and sorting it with primary is O(NlogN). And summary O(N + NlogN) = O(N logN).
And the bad side of this solution is that it requires double memory.
In conclusion the main sense of this solution is quickly translate your elements to elements which you can quickly compare.
An in place algorithm is
for (int i = 0; i < n-2; i++)
{
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
There is probably a more elegant way to write the loop, but this is O(n*m), where n is the number of elements and m is the number of keys. So if m is much smaller than n (with a best case being that all the keys are the same), this can be approximated by O(n). Worst case, the number of key ~= n, so this is O(n^2). I have no idea what you expect for the number of keys, so I can't really do the average case, but it is most likely O(n^2) for the average case as well.
For a small number of keys, this may work faster than unordered multimap, but you'll have to measure to find out.
Note: the order of clusters is completely random.
Edit: (much more efficient in the partially-clustered case, doesn't change complexity)
for (int i = 0; i < n-2; i++)
{
for(;i<n-2 && v[i+1].first==v[i].first; i++){}
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
Edit 2: At /u/MrPisarik's comment, removed redundant i check in inner loop.
I'm surprised no one has suggested the use of std::partition yet. It makes the solution nice, elegant, and generic:
template<typename BidirIt, typename BinaryPredicate>
void equivalence_partition(BidirIt first, BidirIt last, BinaryPredicate p) {
using element_type = typename std::decay<decltype(*first)>::type;
if(first == last) {
return;
}
auto new_first = std::partition
(first, last, [=](element_type const &rhs) { return p(*first, rhs); });
equivalence_partition(new_first, last, p);
}
template<typename BidirIt>
void equivalence_partition(BidirIt first, BidirIt last) {
using element_type = typename std::decay<decltype(*first)>::type;
equivalence_partition(first, last, std::equal_to<element_type>());
}
Example here.

Insert multiple values into vector

I have a std::vector<T> variable. I also have two variables of type T, the first of which represents the value in the vector after which I am to insert, while the second represents the value to insert.
So lets say I have this container: 1,2,1,1,2,2
And the two values are 2 and 3 with respect to their definitions above. Then I wish to write a function which will update the container to instead contain:
1,2,3,1,1,2,3,2,3
I am using c++98 and boost. What std or boost functions might I use to implement this function?
Iterating over the vector and using std::insert is one way, but it gets messy when one realizes that you need to remember to hop over the value you just inserted.
This is what I would probably do:
vector<T> copy;
for (vector<T>::iterator i=original.begin(); i!=original.end(); ++i)
{
copy.push_back(*i);
if (*i == first)
copy.push_back(second);
}
original.swap(copy);
Put a call to reserve in there if you want. You know you need room for at least original.size() elements. You could also do an initial iteraton over the vector (or use std::count) to determine the exact amount of elements to reserve, but without testing, I don't know whether that would improve performance.
I propose a solution that works in place and in O(n) in memory and O(2n) time. Instead of O(n^2) in time by the solution proposed by Laethnes and O(2n) in memory by the solution proposed by Benjamin.
// First pass, count elements equal to first.
std::size_t elems = std::count(data.begin(), data.end(), first);
// Resize so we'll add without reallocating the elements.
data.resize(data.size() + elems);
vector<T>::reverse_iterator end = data.rbegin() + elems;
// Iterate from the end. Move elements from the end to the new end (and so elements to insert will have some place).
for(vector<T>::reverse_iterator new_end = data.rbegin(); end != data.rend() && elems > 0; ++new_end,++end)
{
// If the current element is the one we search, insert second first. (We iterate from the end).
if(*end == first)
{
*new_end = second;
++new_end;
--elems;
}
// Copy the data to the end.
*new_end = *end;
}
This algorithm may be buggy but the idea is to copy only once each elements by:
Firstly count how much elements we'll need to insert.
Secondly by going though the data from the end and moving each elements to the new end.
This is what I probably would do:
typedef ::std::vector<int> MyList;
typedef MyList::iterator MyListIter;
MyList data;
// ... fill data ...
const int searchValue = 2;
const int addValue = 3;
// Find first occurence of searched value
MyListIter iter = ::std::find(data.begin(), data.end(), searchValue);
while(iter != data.end())
{
// We want to add our value after searched one
++iter;
// Insert value and return iterator pointing to the inserted position
// (original iterator is invalid now).
iter = data.insert(iter, addValue);
// This is needed only if we want to be sure that out value won't be used
// - for example if searchValue == addValue is true, code would create
// infinite loop.
++iter;
// Search for next value.
iter = ::std::find(iter, data.end(), searchValue);
}
but as you can see, I couldn't avoid the incrementation you mentioned. But I don't think that would be bad thing: I would put this code to separate functions (probably in some kind of "core/utils" module) and - of course - implement this function as template, so I would write it only once - only once worrying about incrementing value is IMHO acceptable. Very acceptable.
template <class ValueType>
void insertAfter(::std::vector<ValueType> &io_data,
const ValueType &i_searchValue,
const ValueType &i_insertAfterValue);
or even better (IMHO)
template <class ListType, class ValueType>
void insertAfter(ListType &io_data,
const ValueType &i_searchValue,
const ValueType &i_insertAfterValue);
EDIT:
well, I would solve problem little different way: first count number of the searched value occurrence (preferably store in some kind of cache which can be kept and used repeatably) so I could prepare array before (only one allocation) and used memcpy to move original values (for types like int only, of course) or memmove (if the vector allocated size is sufficient already).
In place, O(1) additional memory and O(n) time (Live at Coliru):
template <typename T, typename A>
void do_thing(std::vector<T, A>& vec, T target, T inserted) {
using std::swap;
typedef typename std::vector<T, A>::size_type size_t;
const size_t occurrences = std::count(vec.begin(), vec.end(), target);
if (occurrences == 0) return;
const size_t original_size = vec.size();
vec.resize(original_size + occurrences, inserted);
for(size_t i = original_size - 1, end = i + occurrences; i > 0; --i, --end) {
if (vec[i] == target) {
--end;
}
swap(vec[i], vec[end]);
}
}