// set::insert (C++98)
#include <iostream>
#include <set>
int main ()
{
std::set<int> myset;
std::set<int>::iterator it;
std::pair<std::set<int>::iterator,bool> ret;
// set some initial values:
for (int i=1; i<=5; ++i) myset.insert(i*10); // set: 10 20 30 40 50
ret = myset.insert(20); // no new element inserted
if (ret.second==false) it=ret.first; // "it" now points to element 20
myset.insert (it,25); // max efficiency inserting
myset.insert (it,24); // max efficiency inserting
myset.insert (it,26); // no max efficiency inserting
int myints[]= {5,10,15}; // 10 already in set, not inserted
myset.insert (myints,myints+3);
std::cout << "myset contains:";
for (it=myset.begin(); it!=myset.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
return 0;
}
I see this code as example on cplusplus reference site. It says
myset.insert (it,25); // max efficiency inserting
myset.insert (it,24); // max efficiency inserting
this is max efficiency inserting but I don't get it.
Can anybody tell me why?
std::set uses a balanced tree structure. When you call insert, you are allowed to provide a hint to the implementation - which it can use to speed up insertion.
Think of how general insert methods into a regular binary search tree work. You start at the root node, and you must progress down using the usual checks:
void insert(node* current, const T& value)
{
if(node == nullptr) // Construct our new node here
else if(value < node->current) insert(current->left, value);
else if(value > node->current) insert(current->right, value);
}
void insert(const T& value)
{
insert(root, value);
}
In a balanced tree, this must perform (on average) O(log n) comparisons to insert a given value.
However, suppose that, instead of starting at the root node, we give the implementation a starting node that is where the actual insert will happen. For example, in the above, we know that 24 and 25 will become children of the node containing 20. Hence, if we start at that node, we don't need to do our O(log n) comparisons - we can simply insert our nodes straight away. This is what is meant by "maximum efficiency" insertion.
Have you read notices?
position
Hint for the position where the element can be inserted.
c++98
The function optimizes its insertion time if position points to the
element that will precede the inserted element.
it points to 20 and precedes 25.
In general, std::set has to look up where it inserts; this
look up is O(lg n). If you provide a "hint" (the additional
iterator) where the insertion should take place, the
implementation will first check if this hint is correct (which
can be done in O(1)), and if it is, insert there, thus skipping
the O(lg n) look up. If the hint isn't correct, of course, it
then reverts to the insertion as if it hadn't gotten the hint.
There are two cases where you regularly use the hint: The first
is when you are inserting a sequence of already sorted data. In
this case, the hint is the end iterator, since if the data is
already sorted, each new value will in fact be inserted at the
end. The second is when copying into the set using an insertion
iterator. In this case, the "hint" (which can be anything)
isn't only used for syntactical reasons: you can't use
a back_insertion_iterator or a front_insertion_iterator,
because std::set doesn't have push_back or push_front, and
the simple insertion_iterator requires an iterator to tell it
where to insert; this iterator will be passed to insert.
Related
I have a loop like this (where mySet is a std::set):
for(auto iter=mySet.begin(); iter!=mySet.end(); ++iter){
if (someCondition){mySet.insert(newElement);}
if (someotherCondition){mySet.insert(anothernewElement);}
}
I am experiencing some strange behavior, and I am asking myself if this could be due to the inserted element being inserted "before" the current iterator position in the loop. Namely, I have an Iteration where both conditions are true, but still the distance
distance(iter, mySet.end())
is only 1, not 2 as I would expect. Is my guess about set behavior right? And more importantly, can I still do what I want to do?
what I'm trying to do is to build "chains" on a hexagonal board beween fields of the same color. I have a set containing all fields of my color, and the conditions check the color of neighboring fields, and if they are of the same color, copy this field to mySet, so the chain.
I am trying to use std::set for this because it allows no fields to be in the chain more than once. Reading the comments so far I fear I need to swich to std::vector, where append() will surely add the element at the end, but then I will run into new problems due to having to think of a way to forbid doubling of elements. I therefore am hoping for advice how to solve this the best way.
Depending on the new element's value, it may be inserted before or after current iterator value. Below is an example of inserting before and after an iterator.
#include <iostream>
#include <set>
int main()
{
std::set<int> s;
s.insert(3);
auto it = s.begin();
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(2); // 2 will be inserted before it
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(5); // 5 will be inserted after it
std::cout << std::distance(it, s.end()) << std::endl; // prints 2
}
Regarding your question in the comments: In my particular case, modifying it while iterating is basically exactly what I want, but of course I need to add averything after the current position; no you can not manually arrange the order of the elements. A new value's order is determined by comparing the new one and existing elements. Below is the quote from cppreference.
std::set is an associative container that contains a sorted set of unique objects of type Key. Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
Thus, the implementation of the set will decide where exactly it will be placed.
If you really need to add values after current position, you need to use a different container. For example, simply a vector would be suitable:
it = myvector.insert ( it+1 , 200 ); // +1 to add after it
If you have a small number of items, doing a brute-force check to see if they're inside a vector can actually be faster than checking if they're in a set. This is because vectors tend to have better cache locality than lists.
We can write a function to do this pretty easily:
template<class T>
void insert_unique(std::vector<T>& vect, T const& elem) {
if(std::find(vect.begin(), vect.end(), elem) != vect.end()) {
vect.push_back(elem);
}
}
I am writing a tree container at the moment (just for understanding and training) and by now I got a first and very basic approach to add elements to the tree.
This is my tree code by know. No destructor, no cleanup and no element access by now.
template <class T> class set
{
public:
struct Node
{
Node(const T& val)
: left(0), right(0), value(val)
{}
Node* left;
Node* right;
T value;
};
set()
{}
template <class T> void add(const T& value)
{
if (m_Root == nullptr)
{
m_Root = new Node(value);
}
Node* next = nullptr;
Node* current = m_Root;
do
{
if (next != nullptr)
{
current = next;
}
next = value >= current->value ? current->left : current->right;
} while (next != nullptr);
value >= current->value ? current->left = new Node(value) : current->right = new Node(value);
}
private:
Node* m_Root;
};
Well, now I tested the add performance against the insert performance of a std::set with unique and balanced (low and high) values and came to the conclusion that the performance is simple awful.
Is there a reason why the set inserts values that much faster and what would be a decent way of improving the insert performance of my approach? (I know that there might be better tree models, but as far as I know, the insert performance should be close together between most tree models).
under an i5 4570 stock clock,
the std::set needs 0.013s to add 1000000 int16 values.
my set need 4.5s to add the same values.
where does this big difference come from?
Update:
Allright, here is my testcode:
int main()
{
int n = 1000000;
test::set<test::int16> mset; //my set
std::set<test::int16> sset; //std set
std::timer timer; //simple wrapper for clock()
test::random_engine engine(0, 500000); //simple wrapper for rand() and yes, it's seeded, and yes I am aware that an int16 will overflow
std::set<test::int16> values; //Set of values to ensure unique values
bool flip = false;
for (int i = 0; n > i; ++i)
{
values.insert(flip ? engine.generate() : 0 - engine.generate());
flip = !flip; //ensure that we get high and low values and no straight line, but at least 2 paths
}
timer.start();
for (std::set<test::int16>::iterator it = values.begin(); values.end() != it; ++it)
{
mset.add(*it);
}
timer.stop();
std::cout << timer.totalTime() << "s for mset\n";
timer.reset();
timer.start();
for (std::set<test::int16>::iterator it = values.begin(); values.end() != it; ++it)
{
sset.insert(*it);
}
timer.stop();
std::cout << timer.totalTime() << "s for std\n";
}
the set won't store every value due to dubicates but both containers will get a high number and the same values in the same order to ensure representative results. I know the test could be more accurate but it should give some comparable numbers.
std::set implementation usually uses red-black tree data structure. It's a self-balancing binary search tree, and it's insert operation is guaranteed to be O(log(n)) time complexity in the worst-case (that is required by the standard). You use simple binary search tree with O(n) worst-case insert operation.
If you insert unique random values, such a big difference looks suspicious. But don't forget that randomness will not make your tree balanced and the height of the tree could be much bigger than log(n)
Edit
It seems I found the main problem with your code. All generated values you store in std::set. After that, you add them to the sets in the increasing order. That's degrading your set to the linked list.
The two obvious differences are:
the red-black tree (probably) used in std::set rebalances itself to put an upper bound on worst-case behaviour, exactly as DAle says.
If this is the problem, you should see it when plotting N (number of nodes inserted) against time-per-insert. You could also keep track of tree depth (at least for debugging purposes), and plot that against N.
the standard containers use an allocator which probably does something smarter than newing each node individually. You could try using std::allocator in your own container to see if that makes a significant improvement.
Edit 1 if you implemented a pool allocator, that's relevant information that should have been in the question.
Edit 2 now that you've added your test code, there's an obvious problem which means your set will always have the worst-case performance for insertion. You pre-sorted your input values! std::set is an ordered container, so putting your values in there first guarantees you always insert in increasing value order, so your tree (which does not self-balance) degenerates to an expensive linked list, and your inserts are always linear rather than logarithmic time.
You can verify this by storing your values in a vector instead (just using the set to detect collisions), or using an unordered_set to deduplicate without pre-sorting.
I'm looking for a data structure (array-like) that allows fast (faster than O(N)) arbitrary insertion of values into the structure. The data structure must be able to print out its elements in the way they were inserted. This is similar to something like List.Insert() (which is too slow as it has to shift every element over), except I don't need random access or deletion. Insertion will always be within the size of the 'array'. All values are unique. No other operations are needed.
For example, if Insert(x, i) inserts value x at index i (0-indexing). Then:
Insert(1, 0) gives {1}
Insert(3, 1) gives {1,3}
Insert(2, 1) gives {1,2,3}
Insert(5, 0) gives {5,1,2,3}
And it'll need to be able to print out {5,1,2,3} at the end.
I am using C++.
Use skip list. Another option should be tiered vector. The skip list performs inserts at const O(log(n)) and keeps the numbers in order. The tiered vector supports insert in O(sqrt(n)) and again can print the elements in order.
EDIT: per the comment of amit I will explain how do you find the k-th element in a skip list:
For each element you have a tower on links to next elements and for each link you know how many elements does it jump over. So looking for the k-th element you start with the head of the list and go down the tower until you find a link that jumps over no more then k elements. You go to the node pointed to by this node and decrease k with the number of elements you have jumped over. Continue doing that until you have k = 0.
Did you consider using std::map or std::vector ?
You could use a std::map with the rank of insertion as key. And vector has a reserve member function.
You can use an std::map mapping (index, insertion-time) pairs to values, where insertion-time is an "autoincrement" integer (in SQL terms). The ordering on the pairs should be
(i, t) < (i*, t*)
iff
i < i* or t > t*
In code:
struct lt {
bool operator()(std::pair<size_t, size_t> const &x,
std::pair<size_t, size_t> const &y)
{
return x.first < y.first || x.second > y.second;
}
};
typedef std::map<std::pair<size_t, size_t>, int, lt> array_like;
void insert(array_like &a, int value, size_t i)
{
a[std::make_pair(i, a.size())] = value;
}
Regarding your comment:
List.Insert() (which is too slow as it has to shift every element over),
Lists don't shift their values, they iterate over them to find the location you want to insert, be careful what you say. This can be confusing to newbies like me.
A solution that's included with GCC by default is the rope data structure. Here is the documentation. Typically, ropes come to mind when working with long strings of characters. Here we have ints instead of characters, but it works the same. Just use int as the template parameter. (Could also be pairs, etc.)
Here's the description of rope on Wikipedia.
Basically, it's a binary tree that maintains how many elements are in the left and right subtrees (or equivalent information, which is what's referred to as order statistics), and these counts are updated appropriately as subtrees are rotated when elements are inserted and removed. This allows O(lg n) operations.
There's this data structure which pushes insertion time down from O(N) to O(sqrt(N)) but I'm not that impressed. I feel one should be able to do better but I'll have to work at it a bit.
In c++ you can just use a map of vectors, like so:
int main() {
map<int, vector<int> > data;
data[0].push_back(1);
data[1].push_back(3);
data[1].push_back(2);
data[0].push_back(5);
map<int, vector<int> >::iterator it;
for (it = data.begin(); it != data.end(); it++) {
vector<int> v = it->second;
for (int i = v.size() - 1; i >= 0; i--) {
cout << v[i] << ' ';
}
}
cout << '\n';
}
This prints:
5 1 2 3
Just like you want, and inserts are O(log n).
In this question I'm not asking how to do it but HOW IS IT DONE.
I'm trying (as an excersise) implement simple map and although I do not have problems with implementing links and they behavior (how to find next place to insert new link etc.) I'm stuck with the problem how to implement iterating over a map. When you think about it and look at std::map this map is able to return begin and end iterator. How? Especially end?
If map is a tree how can you say which branch of this map is an end? I just do not understand it. An how to iterate over a map? Starting from the top of the tree and then what? Go and list everything on the left? But those nodes on the left have also links to the right. I really don't know. I will be really glad if someone could explain it to me or give me a link so I could read about it.
A map is implemented using a binary search tree. To meet the complexity requirements it has to be a self-balancing tree, so a red-black tree is usually used, but that doesn't affect how you iterate over the tree.
To read the elements out of a binary search tree in order from least to greatest, you need to perform an in-order traversal of the tree. The recursive implementation is quite simple but isn't really practical for use in an iterator (the iterator would have to maintain a stack internally, which would make it relatively expensive to copy).
You can implement an iterative in-order traversal. This is an implementation taken from a library of tree containers I wrote a while ago. NodePointerT is a pointer to a node, where the node has left_, right_, and parent_ pointers of type NodePointerT.
// Gets the next node in an in-order traversal of the tree; returns null
// when the in-order traversal has ended
template <typename NodePointerT>
NodePointerT next_inorder_node(NodePointerT n)
{
if (!n) { return n; }
// If the node has a right child, we traverse the link to that child
// then traverse as far to the left as we can:
if (n->right_)
{
n = n->right_;
while (n->left_) { n = n->left_; }
}
// If the node is the left node of its parent, the next node is its
// parent node:
else if (n->parent_ && n == n->parent_->left_)
{
n = n->parent_;
}
// Otherwise, this node is the furthest right in its subtree; we
// traverse up through its parents until we find a parent that was a
// left child of a node. The next node is that node's parent. If
// we have reached the end, this will set node to null:
else
{
while (n->parent_ && n == n->parent_->right_) { n = n->parent_; }
n = n->parent_;
}
return n;
}
To find the first node for the begin iterator, you need to find the leftmost node in the tree. Starting at the root node, follow the left child pointer until you encounter a node that has no left child: this is the first node.
For an end iterator, you can set the node pointer to point to the root node or to the last node in the tree and then keep a flag in the iterator indicating that it is an end iterator (is_end_ or something like that).
The representation of your map's iterator is totally up to you. I think it should suffice to use a single wrapped pointer to a node. E.g.:
template <typename T>
struct mymapiterator
{
typename mymap<T>::node * n;
};
Or something similar. Now, mymap::begin() could return such instance of the iterator that n would point to the leftmost node. mymap::end() could return instance with n pointing to root probably or some other special node from which it is still possible to get back to rightmost node so that it could satisfy bidirectional iteration from end iterator.
The operation of moving between the nodes (operators++() and operator--(), etc.) are about traversing the tree from smaller to bigger values or vice versa. Operation that you probably have already implemented during insertion operation implementation.
For sorting purposes, a map behaves like a sorted key/value container (a.k.a. a dictionary); you can think of it as a sorted collection of key/value pairs, and this is exactly what you get when you query for an iterator. Observe:
map<string, int> my_map;
my_map["Hello"] = 1;
my_map["world"] = 2;
for (map<string, int>::const_iterator i = my_map.begin(); i != my_map.end(); ++i)
cout << i->first << ": " << i->second << endl;
Just like any other iterator type, the map iterator behaves like a pointer to a collection element, and for map, this is a std::pair, where first maps to the key and second maps to the value.
std::map uses a binary search internally when you call its find() method or use operator[], but you shouldn't ever need to access the tree representation directly.
One big trick you may be missing is that the end() iterator does not need to point to anything. It can be NULL or any other special value.
The ++ operator sets an iterator to the same special value when it goes past the end of the map. Then everything works.
To implement ++ you might need to keep next/prev pointers in each node, or you could walk back up the tree to find the next node by comparing the node you just left to the parent's right-most node to see if you need to walk to that parent's node, etc.
Don't forget that the iterators to a map should stay valid during insert/erase operations (as long as you didn't erase the iterator's current node).
This is similar to a recent question.
I will be maintaining sorted a list of values. I will be inserting items of arbitrary value into the list. Each time I insert a value, I would like to determine its ordinal position in the list (is it 1st, 2nd, 1000th). What is the most efficient data structure and algorithm for accomplishing this? There are obviously many algorithms which could allow you to do this but I don't see any way to easily do this using simple STL or QT template functionality. Ideally, I would like to know about existing open source C++ libraries or sample code that can do this.
I can imagine how to modify a B-tree or similar algorithm for this purpose but it seems like there should be an easier way.
Edit3:
Mike Seymour pretty well confirmed that, as I wrote in my original post, that there is indeed no way to accomplish this task using simple STL. So I'm looking for a good btree, balanced-tree or similar open source c++ template which can accomplish without modification or with the least modification possible - Pavel Shved showed this was possible but I'd prefer not to dive into implementing a balanced tree myself.
(the history should show my unsuccessful efforts to modify Mathieu's code to be O(log N) using make_heap)
Edit 4:
I still give credit to Pavel for pointing out that btree can give a solution to this, I have to mention that simplest way to achieve this kind of functionality without implementing a custom btree c++ template of your own is to use an in-memory database. This would give you log n and is fairly easy to implement.
Binary tree is fine with this. Its modification is easy as well: just keep in each node the number of nodes in its subtree.
After you inserted a node, perform a search for it again by walking from root to that node. And recursively update the index:
if (traverse to left subtree)
index = index_on_previous_stage;
if (traverse to right subtree)
index = index_on_previous_stage + left_subtree_size + 1;
if (found)
return index + left_subtree_size;
This will take O(log N) time, just like inserting.
I think you can std::set here. It provides you sorting behavior also returns the position of the iterator where the value is inserted. From that position you can get the index. For example:
std::set<int> s;
std::pair<std::set<int>::iterator, bool> aPair = s.insert(5);
size_t index = std::distance(s.begin(), aPair.first) ;
Note that the std::list insert(it, value) member function returns an iterator to the newly inserted element. Maybe it can help?
If, as you say in one of your comments, you only need an approximate ordinal position,
you could estimate this from the range of values you already have - you only need to read the first and last values in the collection in constant time, something like this:
multiset<int> values;
values.insert(value);
int ordinal = values.size() * (value - values.front()) /
(values.back()-values.front());
To improve the approximation, you could keep track of statistical properties (mean and variance, and possibly higher-order moments for better accuracy) of the values as you add them to a multiset. This will still be constant time. Here's a vague sketch of the sort of thing you might do:
class SortedValues : public multiset<int>
{
public:
SortedValues() : sum(0), sum2(0) {}
int insert(int value)
{
// Insert the value and update the running totals
multiset<int>::insert(value);
sum += value;
sum2 += value*value;
// Calculate the mean and deviation.
const float mean = float(sum) / size();
const float deviation = sqrt(mean*mean - float(sum2)/size());
// This function is left as an exercise for the reader.
return size() * EstimatePercentile(value, mean, deviation);
}
private:
int sum;
int sum2;
};
If you want ordinal position, you want a container which models the RandomAccessContainer Concept... basically, a std::vector.
Operations of sorts on a std::vector are relatively fast, and you can get to the position you wish using std::lower_bound or std::upper_bound, you can decide by yourself if you want multiple values at once, to retrieve all equal values, a good way is to use std::equal_range which basically gives you a the same result as applying both the lower and upper bounds but with a better complexity.
Now, for the ordinal position, the great news is that std::distance as a O(1) complexity on models of RandomAccessIterator.
typedef std::vector<int> ints_t;
typedef ints_t::iterator iterator;
ints_t myInts;
for (iterator it = another.begin(), end = another.end(); it != end; ++it)
{
int myValue = *it;
iterator search = std::lower_bound(myInts.begin(), myInts.end(), myValue);
myInts.insert(search, myValue);
std::cout << "Inserted " << myValue << " at "
<< std::distance(myInts.begin(), search) << "\n";
// Not necessary to flush there, that would slow things down
}
// Find all values equal to 50
std::pair<iterator,iterator> myPair =
std::equal_range(myInts.begin(), myInts.end(), 50);
std::cout << "There are " << std::distance(myPair.first,myPair.second)
<< " values '50' in the vector, starting at index "
<< std::distance(myInts.begin(), myPair.first) << std::endl;
Easy, isn't it ?
std::lower_bound, std::upper_bound and std::equal_range have a O(log(n)) complexity and std::distance has a O(1) complexity, so everything there is quite efficient...
EDIT: as underlined in the comments >> inserting is actually O(n) since you have to move the elements around.
Why do you need the ordinal position? As soon as you insert another item in the list the ordinal positions of other items later in the list will change, so there doesn't seem to be much point in finding the ordinal position when you do an insert.
It may be better to simply append elements to a vector, sort and then use a binary search to find the ordinal position, but it depends on what you are really trying to achieve
If you have the iterator to the item (as suggested by dtrosset), you can use std::distance (e.g. std::distance(my_list.begin(), item_it))
if you have an iterator that you want to find the index of then use std::distance,
which is either O(1) or O(n) depending on the container, however the O(1) containers are going to have O(n) inserts so overall you are looking at an O(n) algorithm with any stl container.
as others have said its not immediatly obvious why this is useful?