Append two containers in constant time - c++

I am looking for a way to append two containers in constant (or at least minimal linear) time.
I noticed linked lists merge, but it seems to sort the elements. Isn't there a container/method to just re-link a container to another one (say, like list1.last_element.next = list2.first_element)?

You can use the std::list::splice method:
std::list<int> list1;
std::list<int> list2;
list1.splice(list1.end(), list2, list2.begin(), list2.end());
This code appends the contents of the list2 to the end of list1.
As Dietmar Kuhl mentioned, the method needs to count the elements in the range you are inserting:
[list2.begin(), list2.end())
so if you provide a range, the complexity is linear. However if you know that you want to append an entire list you can simply do
list1.splice(list1.end(), list2);
in O(1) time.

For std::list<T> there is splice() which can be used to transfer nodes from one list to another list. Sadly, this method got broken to be linear in the length of the spliced sequenced when specifying a range of using two iterators and splice()ing between two std:list<T> object. This change was done in favor of having a constant time size() operation.
std::list<T> l1({ 1, 2, 3 });
std::list<T> l2({ 4, 5, 6 });
l1.splice(l1.end(), l2);

Related

Why isn't std::list::splice a free function?

Splice is a member function that puts part of a linked list into another linked list in constant time.
Why does it need to be a member function? I would expect that I can splice with just iterators into the lists with having a handle on the list itself. Why should the list to be spliced need to be an argument in addition to the start and end iterators?
For testing, I made three lists and mixed up the containers and iterators.
See in the splice below where the containers (empty) don't match the iterators (test0 and test1):
list<int> test0;
list<int> test1;
list<int> empty;
test0.push_back(1);
test0.push_back(2);
test0.push_back(3);
test1.push_back(4);
test1.push_back(5);
test1.push_back(6);
empty.splice(test0.end(), empty, test1.begin(), test1.end());
printf("empty size: %ld\n", empty.size());
printf("test0 size: %ld\n", test0.size());
printf("test1 size: %ld\n", test1.size());
for (const auto& i : test0) {
printf("%d\n", i);
}
Surprisingly, it all worked fine, even the size!
empty size: 0
test0 size: 6
test1 size: 0
1
2
3
4
5
6
I can somewhat understand the iteration working because it just runs until next is null, without regard to the container's front/back pointers. But how did it get the size right? Maybe size is calculated dynamically?
Edit: Based on this explanation of size, size is calculated dynamically for lists, in linear time. So the container is really just a dummy argument. Maybe it's only needed when adding new elements because it has the allocator for making new nodes in the list?
std::list::splice modifies the size of the container. You can't use a container's iterators to modify it's size. You'll notice that there are no free functions in the standard library that can insert new elements into a range using only iterators. At best they can rearrange them.
For example, std::remove shuffles the elements to remove at the end of the container and returns an iterator identifying the range of elements that need to be removed. It can't really remove elements from the range itself.
There are some workarounds, such as by using std::back_inserter, but that works by simulating an unbound range.
I was looking at the std::list::splice implementation the other day.
Typically the iterator abstracts pointers to the list node private implementation. The list nodes contain the _M_prev and _M_next pointers to their neighbor nodes - this is purely implementation dependent. For an empty list, the list contains a sentinal node which serves as both head and tail (again implementation dependent).
So I thought I would try to implement splice using only the list nodes:
void splice(const_iterator pos, list& other,
const_iterator first, const_iterator last)
{
// Hook up last.
last->_M_next = pos->_M_next;
pos->_M_next->_M_prev = last;
// Hook up first.
pos->_M_next = first;
first->_M_prev = pos;
}
I think that looks correct, but I could be wrong.
So based on that implementation, and if size is calculated dynamically, then that would work as you said.
However as François Andrieux pointed out, size being calculated dynamically would be wasteful, and so the container needs to be involved so that the internal size count can be maintained.

C/C++ - Efficient way to compare two lists and find missing elements

I have two lists, L1 and L2, of data containing multiple elements, each unique, of an abstract data type (ie: structs). Each of the two lists:
May contain between zero and one-hundred (inclusive) elements.
Contains no duplicate elements (each element is unique).
May or may not contain elements in the other list (ie: L1 and L2 might be identical, or contain completely different elements).
Is not sorted.
At the lowest level, is stored withing a std::vector<myStruct> container.
What I am typically expecting is that periodically, a new element is added to L2, or an element is subtracted/removed from it. I am trying to detect the differences in the two lists as efficiently (ie: with minimal comparisons) as possible:
If an entry is not present in L2 and is present in L1, carry out one operation: Handle_Missing_Element().
If an entry is present in L2 and not present in L1, carry out another operation: Handle_New_Element().
Once the above checks are carried out, L1 is set to be equal to L2, and at some time in the future, L2 is checked again.
How could I go about finding out the differences between the two lists? There are two approaches I can think of:
Compare both lists via every possible combination of elements. Possibly O(n2) execution complexity (horrible).
bool found;
for i in 1 .. L2->length()
found = false;
for j in 1 .. L1->length()
if (L1[j] == L2[i]
// Found duplicate entry
found = true;
fi
endfor
endfor
Sort the lists, and compare the two lists element-wise until I find a difference. This seems like it would be in near-linear time. The problem is that I would need the lists to be sorted. It would be impractical to manually sort the underlying vector after each addition/removal for the list. It would only be reasonable to do this if it were somehow possible to force vector::push_back() to automatically insert elements such that insertions preseve the sorting of the list.
Is there a straightforward way to accomplish this efficiently in C++? I've found similar such problems, but I need to do more than just find the intersection of two sets, or do such a test with just a set of integers, where sum-related tricks can be used, as I need to carry out different operations for "new" vs "missing" elements.
Thank you.
It would be impractical to manually sort the underlying vector after
each addition/removal for the list. It would only be reasonable to do
this if it were somehow possible to force vector::push_back() to
automatically insert elements such that insertions preseve the sorting
of the list.
What you're talking about here is an ordered insert. There are functions in <algorithm> that allow you do do this. Rather than using std::vector::push_back you would use std::vector::insert, and call std::lower_bound which does a binary search for the first element not less than than a given value.
auto insert_pos = std::lower_bound( L2.begin(), L2.end(), value );
if( insert_pos == L2.end() || *insert_pos != value )
{
L2.insert( insert_pos, value );
}
This makes every insertion O(logN) but if you are doing fewer than N insertions between your periodic checks, it ought to be an improvement.
The zipping operation might look something like this:
auto it1 = L1.begin();
auto it2 = L2.begin();
while( it1 != L1.end() && it2 != L2.end() )
{
if( *it1 < *it2 ) {
Handle_Missing( *it1++ );
} else if( *it2 < *it1 ) {
Handle_New( *it2++ );
} else {
it1++;
it2++;
}
}
while( it1 != L1.end() ) Handle_Missing( *it1++ );
while( it2 != L2.end() ) Handle_New( *it2++ );
Can you create a hash value for your list items? If so, just compute the hash and check the hash table for the other list. This is quick, does not require sorting, and prevents your "every possible combination" problem. If your're using C++ and the STL you could use a map container to hold each list.
Create a hash for each item in L1, and use map to map it associate it with your list item.
Create a similar map for L2, and as each L2 has is created check to see if it's in the L1 map.
When a new element is added to L2, calculate its hash value and check to see if it's in the L1 hash map (using map.find() if using STL maps). If not then carry out your Handle_New_Element() function.
When an element is subtracted from the L2 list and it's hash is not in the L1 hash map then carry out your Handle_Missing_Element() function.
A container that automatically sorts itself on inserts is std::set. Insertions will be O(log n), and comparing the two sets will be O(n). Since all your elements are unique you don't need std::multiset.
For each element of both arrays maintain number of times it is met in the opposite array. You can store these numbers in separate arrays with same indexing, or in the structs you use.
When an element x is inserted into L2, you have to check it for equality with all the elements of L1. On each equality with y, increment counters of both elements x and y.
When an element x is removed from L2, you have to again compare it with all the elements of L1. On each equality with y from L1, decrement counter of y. Counter of x does not matter, since it is removed.
When you want to find non-duplicate elements, you can simply iterate over both arrays. The elements with zero counters are the ones you need.
In total, you need O(|L1|) additional operations per insert and remove, and O(|L1| + |L2|) operations per duplication search. The latter can be reduced to the number of sought-for non-duplicate elements, if you additionally maintain lists of all elements with zero counter.
EDIT: Ooops, it seems that each counter is always either 0 or 1 because of uniqueness in each list.
EDIT2: As Thane Plummer has written, you can additionally use hash table. If you create a hash table for L1, then you can do all the comparisons in insert and remove in O(1). BTW since your L1 is constant, you can even create a perfect hash table for it to make things faster.

How to do fast sorting in sorted list when only one element is changed

I need a list of elements that are always sorted. the operation involved is quite simple, for example, if the list is sorted from high to low, i only need three operations in some loop task:
while true do {
list.sort() //sort the list that has hundreds of elements
val = list[0] //get the first/maximum value in the list
list.pop_front() //remove the first/maximum element
...//do some work here
list.push_back(new_elem)//insert a new element
list.sort()
}
however, since I only add one elem at a time, and I have speed concern, I don't want the sorting go through all the elements, e.g., using bubble sorting. So I just wonder if there is a function to insert the element in order? or whether the list::sort() function is smarter enough to use some kind of quick sort when only one element is added/modified?
Or maybe should I use deque for better speed performance if above are all the operations needed?
thanks alot!
As mentioned in the comments, if you aren't locked into std::list then you should try std::set or std::multiset.
The std::list::insert method takes an iterator which specifies where to add the new item. You can use std::lower_bound to find the correct insertion point; it's not optimal without random access iterators but it still only does O(log n) comparisons.
P.S. don't use variable names that collide with built-in classes like list.
lst.sort(std::greater<T>()); //sort the list that has hundreds of elements
while true do {
val = lst.front(); //get the first/maximum value in the list
lst.pop_front(); //remove the first/maximum element
...//do some work here
std::list<T>::iterator it = std::lower_bound(lst.begin(), lst.end(), std::greater<T>());
lst.insert(it, new_elem); //insert a new element
// lst is already sorted
}

Iterate through part of a list in C++

I am new to C++ and I am working on an implementation of merge sort to help familiarize myself with the language.
Currently I have a list of integers and I want to create 2 sublists and store the first half of the original list into a list named 'left' and the remaining half into a list named 'right'
Ex: Assume my original list data is 16, 24, 56, 12, 89; I want to iterate through this list adding
16, 24 to a new sublist 'left'
and adding 56, 12, 89 to sublist 'right'
So left would result in [16, 24]
and right would be [56, 12, 89]
Here is the code I have right now; what conditions should I write in my if statements?
('l' is the name of the list that is passed in the function's parameters)
list<int> left, right;
int midpt = l.size()/2;
for(listIt = l.begin(); listIt!= l.end(); listIt++){
if () left.push_back(*listIt);
if () right.push_back(*listIt);
}
This might be easier:
#include <iterator>
#include <list>
auto middle = std::next(l.begin(), l.size() / 2);
std::list<int> left(l.begin(), middle), right(middle, l.end());
This constructs two new lists, left and right, directly from the respective ranges. The std::next algorithm returns an iterator obtained by advancing the given iterator by the given number of steps. Note that std::list<int>::size() has constant runtime complexity as of C++11, though iterating the iterator takes a linear amount of work.
This would work.
list<int> left, right;
int midpt = l.size()/2;
int count = 0;
for(listIt = l.begin(); listIt!= l.end(); listIt++){
if (count++ < midpt)
left.push_back(*listIt);
else
right.push_back(*listIt);
}
Use std::list::insert instead of a for-loop. Try -
left.insert(l.begin(), l.begin() + midpt);
right.insert(l.begin() + midpt, l.end());
Given that std::list defines a doubly linked list, it might be worth considering a somewhat different approach. Instead of finding the middle of the list, and adding what's left of that to one list, and what's right of it to the other, you might consider adding the first element to your left list, the last element to your right list, advance both iterators, and repeat (until the iterators meet).
Most of the other methods require a linear traversal of the list to find the middle, followed by linear traversal of both sides to build the two new lists. In other words, you traverse the list 1.5 times.
By traversing from both the front and back simultaneously, you can do the whole job by traversing the list only once. In terms of big-O complexity, these are equivalent (O(N) in both cases), but in more practical terms we can expect traversing the list once to be somewhere around half again faster than traversing it 1.5 times (though depending on the list size, caching will probably affect that to at least some degree).

Why is erase() function so expensive?

Consider a 2d vector vector < vector <int> > Nand lets say its contents are as follows:
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
So the size of N here is 4 i.e. N.size() = 4
Now, consider the following code :
int i = 0;
while(N != empty()){
N.erase(i);
++i;
}
I calculated the time just for this piece of code alone with various sizes for N and following are the results:
The size of N is 1000
Execution Time: 0.230000s
The size of N is 10000
Execution Time: 22.900000s
The size of N is 20000
Execution Time: 91.760000s
The size of N is 30000
Execution Time: 206.620000s
The size of N is 47895
Execution Time: 526.540000s
My question is why is this function so expensive ? If it is so then conditional erase statements in many programs could take forever just because of this function. It is the same case when I use erase function in std::map too. Is there any alternative for this function. Does other libraries like Boost offer any?
Please do not say I could do N.erase() as a whole because I'm just trying to analyze this function.
Consider what happens when you delete the first element of a vector. The rest of the vector must be "moved" down by one index, which involves copying it. Try erasing from the other end, and see if that makes a difference (I suspect it will...)
Because your algorithm is O(n^2). Each call to erase forces the vector to move all elements after the erased element back. So in your loop with the 4 element vector, the first loop causes 3 elements to be shifted, the second iteration causes 1 element to be shifted, and after that you have undefined behavior.
If you had 8 elements, the first iteration would move 7 elements, the next would move 5 elements, the next would move 3 elements, and the final enumeration would move 1 element. (And again you have undefined behavior)
When you encounter situations like this, generally you should use the standard algorithms (i.e. std::remove, std::remove_if) instead, as they run through the container once and turn typical O(n^2) algorithms into O(n) algorithms. For more information see Scott Meyers' "Effective STL" Item 43: Prefer Algorithm Calls to Explicit Loops.
A std::vector is, internally, just an array of elements. If you delete an element in the middle, all the elements after it have to be shifted down. This can be very expensive - even more so if the elements have a custom operator= that does a lot of work!
If you need erase() to be fast, you should use a std::list - this will use a doubly linked list structure that allows fast erasure from the middle (however, other operations get somewhat slower). If you just need to remove from the start of the list quickly, use std::deque - this creates a linked list of arrays, and offers most of the speed advantages of std::vector while still allowing fast erasures from the beginning or end only.
Furthermore, note that your loop there makes the problem worse - you first scan through all elements equal to zero and erase them. The scan takes O(n) time, the erasure also O(n) time. You then repeat for 1, and so on - overall, O(n^2) time. If you need to erase multiple values, you should take an iterator and go through the std::list yourself, using the iterator variant of erase(). Or if you use a vector, you'll find it can be faster to copy into a new vector.
As for std::map (and std::set) - this isn't a problem at all. std::map is capable of both removing elements at random, as well as searching for elements at random, with O(lg n) time - which is quite reasonable for most uses. Even your naive loop there shouldn't be too bad; manually iterating through and removing everything you want to remove in one pass is somewhat more efficient, but not nearly to the extent that it is with std::list and friends.
vector.erase will advance all elements after i forward by 1. This is an O(n) operation.
Additionally, you're passing vectors by value rather than by reference.
Your code also doesn't erase the entire vector.
For example:
i = 0
erase N[0]
N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}
i = 1
erase N[1]
N = {{2, 2, 2, 2}, {4, 4, 4, 4}}
i = 2
erase N[2] nothing happens because the maximum index is N[1]
Lastly, I don' think that's the correct syntax for vector.erase(). You need to pass in an iterator to the begin location to erase the element you want.
Try this:
vector&ltvector&ltint&gt&gt vectors; // still passing by value so it'll be slow, but at least erases everything
for(int i = 0; i &lt 1000; ++i)
{
vector&ltint&gt temp;
for(int j = 0; j &lt 1000; ++j)
{
temp.push_back(i);
}
vectors.push_back(temp);
}
// erase starting from the beginning
while(!vectors.empty())
{
vectors.erase(vectors.begin());
}
You can also compare this to erasing from the end (it should be significantly faster, especially when using values rather than references):
// just replace the while-loop at the end
while(!vectors.empty())
{
vectors.erase(vectors.end()-1);
}
A vector is an array that grows automatically as you add elements to it. As such, elements in a vector a contiguous in memory. This allows constant time access to an element. Because they grow from the end, they also take amortized constant time to add or remove to/from the end.
Now, what happens when you remove in the middle? Well, it means whatever exists after the erased element must be shifted back one position. This is very expensive.
If you want to do lots of insertion/removal in the middle, use a linked list such as std::list of std::deque.
As Oli said, erasing from the first element of a vector means the elements following it have to be copied down in order for the array to behave as desired.
This is why linked lists are used for situations in which elements will be removed from random locations in the list - it is quicker (on larger lists) because there is no copying, only resetting some node pointers.