Why isn't std::list::splice a free function? - c++

Splice is a member function that puts part of a linked list into another linked list in constant time.
Why does it need to be a member function? I would expect that I can splice with just iterators into the lists with having a handle on the list itself. Why should the list to be spliced need to be an argument in addition to the start and end iterators?
For testing, I made three lists and mixed up the containers and iterators.
See in the splice below where the containers (empty) don't match the iterators (test0 and test1):
list<int> test0;
list<int> test1;
list<int> empty;
test0.push_back(1);
test0.push_back(2);
test0.push_back(3);
test1.push_back(4);
test1.push_back(5);
test1.push_back(6);
empty.splice(test0.end(), empty, test1.begin(), test1.end());
printf("empty size: %ld\n", empty.size());
printf("test0 size: %ld\n", test0.size());
printf("test1 size: %ld\n", test1.size());
for (const auto& i : test0) {
printf("%d\n", i);
}
Surprisingly, it all worked fine, even the size!
empty size: 0
test0 size: 6
test1 size: 0
1
2
3
4
5
6
I can somewhat understand the iteration working because it just runs until next is null, without regard to the container's front/back pointers. But how did it get the size right? Maybe size is calculated dynamically?
Edit: Based on this explanation of size, size is calculated dynamically for lists, in linear time. So the container is really just a dummy argument. Maybe it's only needed when adding new elements because it has the allocator for making new nodes in the list?

std::list::splice modifies the size of the container. You can't use a container's iterators to modify it's size. You'll notice that there are no free functions in the standard library that can insert new elements into a range using only iterators. At best they can rearrange them.
For example, std::remove shuffles the elements to remove at the end of the container and returns an iterator identifying the range of elements that need to be removed. It can't really remove elements from the range itself.
There are some workarounds, such as by using std::back_inserter, but that works by simulating an unbound range.

I was looking at the std::list::splice implementation the other day.
Typically the iterator abstracts pointers to the list node private implementation. The list nodes contain the _M_prev and _M_next pointers to their neighbor nodes - this is purely implementation dependent. For an empty list, the list contains a sentinal node which serves as both head and tail (again implementation dependent).
So I thought I would try to implement splice using only the list nodes:
void splice(const_iterator pos, list& other,
const_iterator first, const_iterator last)
{
// Hook up last.
last->_M_next = pos->_M_next;
pos->_M_next->_M_prev = last;
// Hook up first.
pos->_M_next = first;
first->_M_prev = pos;
}
I think that looks correct, but I could be wrong.
So based on that implementation, and if size is calculated dynamically, then that would work as you said.
However as François Andrieux pointed out, size being calculated dynamically would be wasteful, and so the container needs to be involved so that the internal size count can be maintained.

Related

C++ container set + array functionality

Which is the best container in C++ which can -
store only unique values (such as set)
can lookup those values using index in constant time (such as array)
I basically need to to iterate in phase one and collect all the unique elements, order really doesn't matter.
However, in phase two, I then have to provide each element in the container, but can only provide it one by one. Since caller can know the size of my container, it provides me index one by one, such that 0 < idx < size of the container.
Right now, the only solution that comes to my mind is two maintain two containers vector and set, I am wondering is there any container that provides the same?
class MyContainer{
private:
std::set<Fruits> setFruits;
std::vector<Fruits> arrFruits; // can have indexed access
public:
void collectFruits(const Fruits& fruit){
if(setFruits.find(fruit) == setFruits.end()){
// insert only if it doens't contains
setFruits.insert(fruit);
arrFruits.push_back(fruit);
}
}
};
Alex Stepanov, the creator of STL, once said "Use vectors whenever you can. If you cannot use vectors, redesign your solution so that you can use vectors." With that good advice in mind:
Phase 1: Collect the unique elements
std::vector<Foo> elements;
// add N elements
elements.push_back(foo1);
...
elements.push_back(fooN);
// done collecting: remove dupes
std::sort(elements.begin(), elements.end());
elements.erase(std::unique(elements.begin(), elements.end()),
elements.end());
Phase 2: Well, now we have a vector of our k unique elements, with constant-time index access (with indices 0..k-1).
You could use a boost flat_set.
I don't think it provides an operator[] but it has random access iterators and has a constant time nth() function that returns an iterator with a particular index.
Inserting may invalidate iterators but providing you do all insertions in phase 1 and then all index access in phase 2 you should be ok.

Is it at all possible to erase from a vector with C++11's for loops?

Alright. For the sake of other (more simple but not explanatory enough) questions that this might look like, I am not asking if this is possible or impossible (because I found that out already), I am asking if there is a lighter alternative to my question.
What I have is what would be considered a main class, and in that main class, there is a variable that references to a 'World Map' class. In essence, this 'WorldMap' class is a container of other class variables. The main class does all of the looping and updates all of the respective objects that are active. There are times in this loop that I need to delete an object of a vector that is deep inside a recursive set of containers (As shown in the code provided). It would be extremely tedious to repeatedly have to reference the necessary variable as a pointer to another pointer (and so on) to point to the specific object I need, and later erase it (this was the concept I used before switching to C++11) so instead I have a range for loop (also shown in the code). My example code shows the idea that I have in place, where I want to cut down on the tedium as well as make the code a lot more readable.
This is the example code:
struct item{
int stat;
};
struct character{
int otherStat;
std::vector<item> myItems;
};
struct charContainer{
std::map<int, character> myChars;
};
int main(){
//...
charContainer box;
//I want to do something closer to this
for(item targItem: box.myChars[iter].myItems){
//Then I only have to use targItem as the reference
if(targItem.isFinished)
box.myChars[iter].myItems.erase(targItem);
}
//Instead of doing this
for(int a=0;a<box.myChars[iter].myItems.size();a++){
//Then I have to repeatedly use box.myChars[iter].myItems[a]
if(box.myChars[iter].myItems[a].isFinished)
box.myChars[iter].myItems.erase(box.myChars[iter].myItems[a]);
}
}
TLDR: I want to remove the tedium of repeatedly calling the full reference by using the new range for loops shown in C++11.
EDIT: I am not trying to delete the elements all at once. I am asking how I would delete them in the matter of the first loop. I am deleting them when I am done with them externally (via an if statement). How would I delete specific elements, NOT all of them?
If you simply want to clear an std::vector, there is a very simple method you can use:
std::vector<item> v;
// Fill v with elements...
v.clear(); // Removes all elements from v.
In addition to this, I'd like to point out that [1] to erase an element in a vector requires the usage of iterators, and [2] even if your approach was allowed, erasing elements from a vector inside a for loop is a bad idea if you are not careful. Suppose your vector has 5 elements:
std::vector<int> v = { 1, 2, 3, 4, 5 };
Then your loop would have the following effect:
First iteration: a == 0, size() == 5. We remove the first element, then the vector will contain {2, 3, 4, 5}
Second iteration: a == 1, size() == 4. We then remove the second element, then the vector will contain {2,4,5}
Third iteration: a == 2, size() == 3. We remove the third element, and we are left with the final result {2,4}.
Since this does not actually empty the vector, I suppose it is not what you were looking for.
If instead you have some particular condition that you want to apply to remove the elements, it is very easily applied in C++11 in the following way:
std::vector<MyType> v = { /* initialize vector */ };
// The following is a lambda, which is a function you can store in a variable.
// Here we use it to represent the condition that should be used to remove
// elements from the vector v.
auto isToRemove = [](const MyType & value){
return /* true if to remove, false if not */
};
// A vector can remove multiple elements at the same time using its method erase().
// Erase will remove all elements within a specified range. We use this method
// together with another method provided by the standard library: remove_if.
// What it does is it deletes all elements for which a particular predicate
// returns true within a range, and leaves the empty spaces at the end.
v.erase( std::remove_if( std::begin(v), std::end(v), isToRemove ), std::end(v) );
// Done!
I am deleting them when I am done with them externally (via an if statement). How would I delete specific elements, NOT all of them?
In my opinion, you're looking at this the wrong way. Writing loops to delete items from a sequence container is always problematic and not recommended. Strive to stay away from removing items in this fashion.
When you work with containers, you should strategically set up your code so that you place the deleted or "about to be deleted" items in a part of the container that is easily accessed, away from the items in the container that you do not want to delete. At the time you actually do want to remove them, you know where they are and thus can call some function to expel them from the container.
One answer was already given, and that is to use the erase-remove(if) idiom. When you call remove or remove_if, the items that are "bad" are moved to the end of the container. The return value for remove(_if) is the iterator to the start of the items that will be removed. Then you feed this iterator to the vector::erase method to delete these items permanently from the container.
The other solution (but probably less used) is the std::partition algorithm. The std::partition also can move the "bad" items to the end of the container, but unlike remove(_if), the items are still valid (i.e. you can leave them at the end of the container and still use them safely). Then later on, you can remove them as you wish in a separate step since std::partition also returns an iterator.
Why not have a standard iterator iterating over a vector. That way you can delete the element by passing an iterator. Then .erase() will return the next available iterator. And if your next iterator is iterator::end() then your loop will exit.

How to do fast sorting in sorted list when only one element is changed

I need a list of elements that are always sorted. the operation involved is quite simple, for example, if the list is sorted from high to low, i only need three operations in some loop task:
while true do {
list.sort() //sort the list that has hundreds of elements
val = list[0] //get the first/maximum value in the list
list.pop_front() //remove the first/maximum element
...//do some work here
list.push_back(new_elem)//insert a new element
list.sort()
}
however, since I only add one elem at a time, and I have speed concern, I don't want the sorting go through all the elements, e.g., using bubble sorting. So I just wonder if there is a function to insert the element in order? or whether the list::sort() function is smarter enough to use some kind of quick sort when only one element is added/modified?
Or maybe should I use deque for better speed performance if above are all the operations needed?
thanks alot!
As mentioned in the comments, if you aren't locked into std::list then you should try std::set or std::multiset.
The std::list::insert method takes an iterator which specifies where to add the new item. You can use std::lower_bound to find the correct insertion point; it's not optimal without random access iterators but it still only does O(log n) comparisons.
P.S. don't use variable names that collide with built-in classes like list.
lst.sort(std::greater<T>()); //sort the list that has hundreds of elements
while true do {
val = lst.front(); //get the first/maximum value in the list
lst.pop_front(); //remove the first/maximum element
...//do some work here
std::list<T>::iterator it = std::lower_bound(lst.begin(), lst.end(), std::greater<T>());
lst.insert(it, new_elem); //insert a new element
// lst is already sorted
}

Append two containers in constant time

I am looking for a way to append two containers in constant (or at least minimal linear) time.
I noticed linked lists merge, but it seems to sort the elements. Isn't there a container/method to just re-link a container to another one (say, like list1.last_element.next = list2.first_element)?
You can use the std::list::splice method:
std::list<int> list1;
std::list<int> list2;
list1.splice(list1.end(), list2, list2.begin(), list2.end());
This code appends the contents of the list2 to the end of list1.
As Dietmar Kuhl mentioned, the method needs to count the elements in the range you are inserting:
[list2.begin(), list2.end())
so if you provide a range, the complexity is linear. However if you know that you want to append an entire list you can simply do
list1.splice(list1.end(), list2);
in O(1) time.
For std::list<T> there is splice() which can be used to transfer nodes from one list to another list. Sadly, this method got broken to be linear in the length of the spliced sequenced when specifying a range of using two iterators and splice()ing between two std:list<T> object. This change was done in favor of having a constant time size() operation.
std::list<T> l1({ 1, 2, 3 });
std::list<T> l2({ 4, 5, 6 });
l1.splice(l1.end(), l2);

Problem with invalidation of STL iterators when calling erase

The STL standard defines that when an erase occurs on containers such as std::deque, std::list etc iterators are invalidated.
My question is as follows, assuming the list of integers contained in a std::deque, and a pair of indicies indicating a range of elements in the std::deque, what is the correct way to delete all even elements?
So far I have the following, however the problem here is that the assumed end is invalidated after an erase:
#include <cstddef>
#include <deque>
int main()
{
std::deque<int> deq;
for (int i = 0; i < 100; deq.push_back(i++));
// range, 11th to 51st element
std::pair<std::size_t,std::size_t> r(10,50);
std::deque<int>::iterator it = deq.begin() + r.first;
std::deque<int>::iterator end = deq.begin() + r.second;
while (it != end)
{
if (*it % 2 == 0)
{
it = deq.erase(it);
}
else
++it;
}
return 0;
}
Examining how std::remove_if is implemented, it seems there is a very costly copy/shift down process going on.
Is there a more efficient way of achieving the above without all the copy/shifts
In general is deleting/erasing an element more expensive than swapping it with the next nth value in the sequence (where n is the number of elements deleted/removed so far)
Note: Answers should assume the sequence size is quite large, +1mil elements and that on average 1/3 of elements would be up for erasure.
I'd use the Erase-Remove Idiom. I think the Wikipedia article linked even shows what you're doing -- removing odd elements.
The copying that remove_if does is no more costly than what happens when you delete elements from the middle of the container. It might even be more efficient.
Calling .erase() also results in "a very costly copy/shift down process going on.". When you erase an element from the middle of the container, every other element after that point must be shifted down one spot into the available space. If you erase multiple elements, you incur that cost for every erased element. Some of the non-erased elements will move several spots, but are forced to move one spot at a time instead of all at once. That is very inefficient.
The standard library algorithms std::remove and std::remove_if optimize this work. They use a clever trick to ensure that every moved element is only moved once. This is much, much faster than what you are doing yourself, contrary to your intuition.
The pseudocode is like this:
read_location <- beginning of range.
write_location <- beginning of range.
while read_location != end of range:
if the element at read_location should be kept in the container:
copy the element at the read_location to the write_location.
increment the write_location.
increment the read_location.
As you can see, every element in the original sequence is considered exactly once, and if it needs to be kept, it gets copied exactly once, to the current write_location. It will never be looked at again, because the write_location can never run in front of the read_location.
Remember that deque is a contiguous memory container (like vector, and probably sharing implementation), so removing elements mid-container necessarily means copying subsequent elements over the hole. You just want to make sure you're doing one iteration and copying each not-to-be-deleted object directly to its final position, rather than moving all objects one by one during each delete. remove_if is efficient and appropriate in this regard, your erase loop is not: it does massive amounts of unnecessary copying.
FWIW - alternatives:
add a "deleted" state to your objects and mark them deleted in place, but then every time you operate on the container you'll need to check yourself
use a list, which is implemented using pointers to previous and next elements, such that removing a list element alters the adjacent points to bypass that element: no copying, efficient iteration, just no random access, more small (i.e. inefficient) heap allocations and pointer overheads
What to choose depends on the nature, relative frequency, and performance requirements of specific operations (e.g. it may be that you can afford slow removals if they're done at non-critical times, but need fastest-possible iteration - whatever it is, make sure you understand your needs and the implications of the various data structures).
One alternative that hasn't been mentioned is to create a new deque, copy the elements that you want to keep into it, and swap it with the old deque.
void filter(std::deque<int>& in, std::pair<std::size_t,std::size_t> range) {
std::deque<int> out;
std::deque<int>::const_iterator first = in.begin();
std::deque<int>::const_iterator curr = first + range.first;
std::deque<int>::const_iterator last = first + range.second;
out.reserve(in.size() - (range.second-range.first));
std::copy(first, curr, std::back_inserter(out));
while (curr != last) {
if (*curr & 1) {
out.push_back(*curr);
}
++curr;
}
std::copy(last, in.end(), std::back_inserter(out));
in.swap(out);
}
I'm not sure if you have enough memory to create a copy, but it usually is faster and easier to make a copy instead of trying to inline erase elements from a large collection. If you still see memory thrashing, then figure out how many elements you are going to keep by calling std::count_if and reserve that many. This way you would have a single memory allocation.