Iterate through part of a list in C++ - c++

I am new to C++ and I am working on an implementation of merge sort to help familiarize myself with the language.
Currently I have a list of integers and I want to create 2 sublists and store the first half of the original list into a list named 'left' and the remaining half into a list named 'right'
Ex: Assume my original list data is 16, 24, 56, 12, 89; I want to iterate through this list adding
16, 24 to a new sublist 'left'
and adding 56, 12, 89 to sublist 'right'
So left would result in [16, 24]
and right would be [56, 12, 89]
Here is the code I have right now; what conditions should I write in my if statements?
('l' is the name of the list that is passed in the function's parameters)
list<int> left, right;
int midpt = l.size()/2;
for(listIt = l.begin(); listIt!= l.end(); listIt++){
if () left.push_back(*listIt);
if () right.push_back(*listIt);
}

This might be easier:
#include <iterator>
#include <list>
auto middle = std::next(l.begin(), l.size() / 2);
std::list<int> left(l.begin(), middle), right(middle, l.end());
This constructs two new lists, left and right, directly from the respective ranges. The std::next algorithm returns an iterator obtained by advancing the given iterator by the given number of steps. Note that std::list<int>::size() has constant runtime complexity as of C++11, though iterating the iterator takes a linear amount of work.

This would work.
list<int> left, right;
int midpt = l.size()/2;
int count = 0;
for(listIt = l.begin(); listIt!= l.end(); listIt++){
if (count++ < midpt)
left.push_back(*listIt);
else
right.push_back(*listIt);
}

Use std::list::insert instead of a for-loop. Try -
left.insert(l.begin(), l.begin() + midpt);
right.insert(l.begin() + midpt, l.end());

Given that std::list defines a doubly linked list, it might be worth considering a somewhat different approach. Instead of finding the middle of the list, and adding what's left of that to one list, and what's right of it to the other, you might consider adding the first element to your left list, the last element to your right list, advance both iterators, and repeat (until the iterators meet).
Most of the other methods require a linear traversal of the list to find the middle, followed by linear traversal of both sides to build the two new lists. In other words, you traverse the list 1.5 times.
By traversing from both the front and back simultaneously, you can do the whole job by traversing the list only once. In terms of big-O complexity, these are equivalent (O(N) in both cases), but in more practical terms we can expect traversing the list once to be somewhere around half again faster than traversing it 1.5 times (though depending on the list size, caching will probably affect that to at least some degree).

Related

Efficient intersection of two sets

I have two sets (or maps) and need to efficiently handle their intersection.
I know that there are two ways of doing this:
iterate over both maps as in std::set_intersection: O(n1+n2)
iterating over one map and finding elements in the other: O(n1*log(n2))
Depending on the sizes either of these two solution is significantly better (have timed it), and I thus need to either switch between these algorithm based on the sizes (which is a bit messy) - or find a solution outperforming both, e.g. using some variant of map.find() taking the previous iterator as a hint (similarly as map.emplace_hint(...)) - but I could not find such a function.
Question: Is it possible to combine the performance characteristics of the two solutions directly using STL - or some compatible library?
Note that the performance requirement makes this different from earlier questions such as
Efficient intersection of sets?
In almost every case std::set_intersection will be the best choice.
The other solution may be better only if the sets contain a very small number of elements.
Due to the nature of the log with base two.
Which scales as:
n = 2, log(n)= 1
n = 4, log(n)= 2
n = 8, log(n)= 3
.....
n = 1024 log(n) = 10
O(n1*log(n2) is significantly more complex than O(n1 + n2) if the length of the sets is more than 5-10 elements.
There is a reason such function is added to the STL and it is implemented like that. It will also make the code more readable.
Selection sort is faster than merge or quick sort for collections with length less than 20 but is rarely used.
For sets that are implemented as binary trees, there actually is an algorithm that combines the benefits of both the procedures you mention. Essentially, you do a merge like std::set_intersection, but while iterating in one tree, you skip any branches that are all less than the current value in the other.
The resulting intersection takes O(min(n1 log n2, n2 log n1, n1 + n2), which is just what you want.
Unfortunately, I'm pretty sure std::set doesn't provide interfaces that could support this operation.
I've done it a few times in the past though, when working on joining inverted indexes and similar things. Usually I make iterators with a skipTo(x) operation that will advance to the next element >= x. To meet my promised complexity it has to be able to skip N elements in log(N) amortized time. Then an intersection looks like this:
void get_intersection(vector<T> *dest, const set<T> set1, const set<T> set2)
{
auto end1 = set1.end();
auto end2 = set2.end();
auto it1 = set1.begin();
if (it1 == end1)
return;
auto it2 = set2.begin();
if (it2 == end2)
return;
for (;;)
{
it1.skipTo(*it2);
if (it1 == end1)
break;
if (*it1 == *it2)
{
dest->push_back(*it1);
++it1;
}
it2.skipTo(*it1);
if (it2 == end2)
break;
if (*it2 == *it1)
{
dest->push_back(*it2);
++it2;
}
}
}
It easily extends to an arbitrary number of sets using a vector of iterators, and pretty much any ordered collection can be extended to provide the iterators required -- sorted arrays, binary trees, b-trees, skip lists, etc.
I don't know how to do this using the standard library, but if you wrote your own balanced binary search tree, here is how to implement a limited "find with hint". (Depending on your other requirements, a BST reimplementation could also leave out the parent pointers, which could be a performance win over the STL.)
Assume that the hint value is less than the value to be found and that we know the stack of ancestors of the hint node to whose left sub-tree the hint node belongs. First search normally in the right sub-tree of the hint node, pushing nodes onto the stack as warranted (to prepare the hint for next time). If this doesn't work, then while the stack's top node has a value that is less than the query value, pop the stack. Search from the last node popped (if any), pushing as warranted.
I claim that, when using this mechanism to search successively for values in ascending order, (1) each tree edge is traversed at most once, and (2) each find traverses the edges of at most two descending paths. Given 2*n1 descending paths in a binary tree with n2 nodes, the cost of the edges is O(n1 log n2). It's also O(n2), because each edge is traversed once.
With regard to the performance requirement, O(n1 + n2) is in most circumstances a very good complexity so only worth considering if you're doing this calc in a tight loop.
If you really do need it, the combination approach isn't too bad, perhaps something like?
Pseudocode:
x' = set_with_min_length([x, y])
y' = set_with_max_length([x, y])
if (x'.length * log(y'.length)) <= (x'.length + y'.length):
return iterate_over_map_find_elements_in_other(y', x')
return std::set_intersection(x, y)
I don't think you'll find an algorithm that will beat either of these complexities but happy to be proven wrong.

How to do fast sorting in sorted list when only one element is changed

I need a list of elements that are always sorted. the operation involved is quite simple, for example, if the list is sorted from high to low, i only need three operations in some loop task:
while true do {
list.sort() //sort the list that has hundreds of elements
val = list[0] //get the first/maximum value in the list
list.pop_front() //remove the first/maximum element
...//do some work here
list.push_back(new_elem)//insert a new element
list.sort()
}
however, since I only add one elem at a time, and I have speed concern, I don't want the sorting go through all the elements, e.g., using bubble sorting. So I just wonder if there is a function to insert the element in order? or whether the list::sort() function is smarter enough to use some kind of quick sort when only one element is added/modified?
Or maybe should I use deque for better speed performance if above are all the operations needed?
thanks alot!
As mentioned in the comments, if you aren't locked into std::list then you should try std::set or std::multiset.
The std::list::insert method takes an iterator which specifies where to add the new item. You can use std::lower_bound to find the correct insertion point; it's not optimal without random access iterators but it still only does O(log n) comparisons.
P.S. don't use variable names that collide with built-in classes like list.
lst.sort(std::greater<T>()); //sort the list that has hundreds of elements
while true do {
val = lst.front(); //get the first/maximum value in the list
lst.pop_front(); //remove the first/maximum element
...//do some work here
std::list<T>::iterator it = std::lower_bound(lst.begin(), lst.end(), std::greater<T>());
lst.insert(it, new_elem); //insert a new element
// lst is already sorted
}

Validity of std::map::iterator after erasing elements

I have written a code for solving the following problem: We have a map<double,double> with (relatively) huge number of items. We want to merge the adjacent items in order to reduce the size of the map keeping a certain "loss factor" as low as possible.
To do so, I first populate a list containing adjacent iterators and the associated loss factor (let's say each list element has the following type:
struct myPair {
map<double,double>::iterator curr, next;
double loss;
myPair(map<double,double>::iterator c, map<double,double>::iterator n,
double l): curr(c), next(n), loss(l) {}
};
). This is done as follows:
for (map<double,double>::iterator it1 = myMap.begin(); it1 != --(myMap.end());
it1++) {
map<double,double>::iterator it2 = it1; it2++;
double l = computeLoss(it1,it2);
List.push(myPair(it1,it2,l));
}
Then, I find the list element corresponding to the lowest loss factor, erase the corresponding elements from the map and insert a new element (result of merging curr and next) in the map. Since this also changes the list elements corresponding to the element after next or before curr I update the corresponding entries and also the associated loss factor.
(I don't get into the details of how to implement the above efficiently but basically I am combining a double linked list and a heap).
While the erase operations should not invalidate the remaining iterators for some specific input instances of the program I get the double free or corruption error exactly at the point where I attempt to erase the elements from the map.
I tried to track this and it seems this happens when both first and second entries of the two map elements are very close (more precisely when the firsts of curr and next are very close).
A strange thing is that I put an assert while populating the list to ensure that in all entries curr and next are different and the same assert in the loop of removing elements. The second one fails!
I would appreciate if anyone can help me.
P.S. I am sorry for not being very precise but I wanted to keep the details as low as possible.
UPDATE: This is (a very simplified version of) how I erase the elements from the map:
while (myMap.size() > MAX_SIZE) {
t = list.getMin();
/* compute the merged version ... let's call the result as (a,b) */
myMap.erase(t.curr);
myMap.erase(t.next);
myMap.insert(pair<double,double>(a,b));
/* update the adjacent entries */
}
Stored iterators in myPair stay invalid after container modification. You should avoid such technique. Probably when you look into header file you will find some ready drafts for your task?
As mentioned already by the other people, it turns out that using double as the key of the map is problematic. In particular when the values are computed.
Hence, my solution was to use std::multimap instead of map (and then merge the elements with the same key just after populating the map). With this, for example even if a is very close to both keys of t.curr and t.next or any other element, for sure the insert operation creates a new element such that no existing iterator in the list would point to that.

Append two containers in constant time

I am looking for a way to append two containers in constant (or at least minimal linear) time.
I noticed linked lists merge, but it seems to sort the elements. Isn't there a container/method to just re-link a container to another one (say, like list1.last_element.next = list2.first_element)?
You can use the std::list::splice method:
std::list<int> list1;
std::list<int> list2;
list1.splice(list1.end(), list2, list2.begin(), list2.end());
This code appends the contents of the list2 to the end of list1.
As Dietmar Kuhl mentioned, the method needs to count the elements in the range you are inserting:
[list2.begin(), list2.end())
so if you provide a range, the complexity is linear. However if you know that you want to append an entire list you can simply do
list1.splice(list1.end(), list2);
in O(1) time.
For std::list<T> there is splice() which can be used to transfer nodes from one list to another list. Sadly, this method got broken to be linear in the length of the spliced sequenced when specifying a range of using two iterators and splice()ing between two std:list<T> object. This change was done in favor of having a constant time size() operation.
std::list<T> l1({ 1, 2, 3 });
std::list<T> l2({ 4, 5, 6 });
l1.splice(l1.end(), l2);

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!
You want to partial_sort first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.
Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)
If you really mean that output order is irrelevant, then you want std::nth_element, rather than std::partial_sort, since it is generally somewhat faster. Note that std::nth_element puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element, where nth is an iterator to the nth element, which means that it is begin() + (n - 1)).
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit, since it is required that *nth be valid. Furthermore, there is no way to request the best 0 elements, just as there is no way to ask for the 0th element with std::nth_element. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element to put the "best" k elements in positions 0..k-1, guaranteeing that the kth element (or one of them, anyway) is at position k-1. It then repartitions the elements preceding position k-1 so that the equal elements are at the end, and the elements following position k-1 so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element is O(n); the two partition operations sum up to O(n); and random_shuffle is O(r) where r is the number of equal elements shuffled. I think that all sums up to O(n) so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle instead of std::random_shuffle, passing a uniform random number generator through to best_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.
If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});