std::set and its front_insert_iterator - c++

I know I can do this:
std::vector<double> vec;
std::back_insert_iterator<std::vector<double> > it( back_inserter(vec) );
it = 4.5;
But I'd like to do something similar (in syntax) to std::set (and use a front_insert_iterator instead of a reference to the set and use set::insert). Is it possible, or am I forced to use a reference o the set? Maybe I should use std::merge and/or std::set_intersect (this would allow for good error reporting in case of duplicates)? Would this be a good approach?
Thanks!

You don't push_back or push_front to set, because (conceptually) a set is a sorted associative container. You can do this, though:
#include <cstdlib>
#include <set>
#include <iterator>
using namespace std;
int main()
{
typedef set<int> MySet;
MySet si;
insert_iterator<MySet> it(si, si.begin());
*it = 1;
*it = 2;
}
EDIT:
Note that the begin() that the iterator is initialized with is not where the elements get put. Rather, it is a hint to the STL where to start looking for where to put the item.
EDIT2:
As per the comments below, you also wanted a way to check the "disposition" of the inserted item. That is, a way to tell if the item was or was not already present before you inserted it.
You cannot do this directly using only the iterator. If you need this information, you have two choices.
1) Don't use the insert iterator. The only way to get the bool you get from set::insert is to call set::insert. So call set::insert
42) Check the size of the set both before and after insertion. If the size grew by one, the item was inserted. :) I've market this as item #42 because IMO it is much less favorable than just calling insert directly for a number of reasons. There may be multithreading issues, there may be a performance hit in computing size(), etc.

You can't insert to the "back" or "front" of a set. Perhaps what you're looking for is std::inserter.
http://stdcxx.apache.org/doc/stdlibref/insert-iterator.html

To use front_insert_iterator the container must have member push_front defined (such as standard containers deque and list).

Related

Iterator Loop vs index loop [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why use iterators instead of array indices?
I'm reviewing my knowledge on C++ and I've stumbled upon iterators. One thing I want to know is what makes them so special and I want to know why this:
using namespace std;
vector<int> myIntVector;
vector<int>::iterator myIntVectorIterator;
// Add some elements to myIntVector
myIntVector.push_back(1);
myIntVector.push_back(4);
myIntVector.push_back(8);
for(myIntVectorIterator = myIntVector.begin();
myIntVectorIterator != myIntVector.end();
myIntVectorIterator++)
{
cout<<*myIntVectorIterator<<" ";
//Should output 1 4 8
}
is better than this:
using namespace std;
vector<int> myIntVector;
// Add some elements to myIntVector
myIntVector.push_back(1);
myIntVector.push_back(4);
myIntVector.push_back(8);
for(int y=0; y<myIntVector.size(); y++)
{
cout<<myIntVector[y]<<" ";
//Should output 1 4 8
}
And yes I know that I shouldn't be using the std namespace. I just took this example off of the cprogramming website. So can you please tell me why the latter is worse? What's the big difference?
The special thing about iterators is that they provide the glue between algorithms and containers. For generic code, the recommendation would be to use a combination of STL algorithms (e.g. find, sort, remove, copy) etc. that carries out the computation that you have in mind on your data structure (vector, list, map etc.), and to supply that algorithm with iterators into your container.
Your particular example could be written as a combination of the for_each algorithm and the vector container (see option 3) below), but it's only one out of four distinct ways to iterate over a std::vector:
1) index-based iteration
for (std::size_t i = 0; i != v.size(); ++i) {
// access element as v[i]
// any code including continue, break, return
}
Advantages: familiar to anyone familiar with C-style code, can loop using different strides (e.g. i += 2).
Disadvantages: only for sequential random access containers (vector, array, deque), doesn't work for list, forward_list or the associative containers. Also the loop control is a little verbose (init, check, increment). People need to be aware of the 0-based indexing in C++.
2) iterator-based iteration
for (auto it = v.begin(); it != v.end(); ++it) {
// if the current index is needed:
auto i = std::distance(v.begin(), it);
// access element as *it
// any code including continue, break, return
}
Advantages: more generic, works for all containers (even the new unordered associative containers, can also use different strides (e.g. std::advance(it, 2));
Disadvantages: need extra work to get the index of the current element (could be O(N) for list or forward_list). Again, the loop control is a little verbose (init, check, increment).
3) STL for_each algorithm + lambda
std::for_each(v.begin(), v.end(), [](T const& elem) {
// if the current index is needed:
auto i = &elem - &v[0];
// cannot continue, break or return out of the loop
});
Advantages: same as 2) plus small reduction in loop control (no check and increment), this can greatly reduce your bug rate (wrong init, check or increment, off-by-one errors).
Disadvantages: same as explicit iterator-loop plus restricted possibilities for flow control in the loop (cannot use continue, break or return) and no option for different strides (unless you use an iterator adapter that overloads operator++).
4) range-for loop
for (auto& elem: v) {
// if the current index is needed:
auto i = &elem - &v[0];
// any code including continue, break, return
}
Advantages: very compact loop control, direct access to the current element.
Disadvantages: extra statement to get the index. Cannot use different strides.
What to use?
For your particular example of iterating over std::vector: if you really need the index (e.g. access the previous or next element, printing/logging the index inside the loop etc.) or you need a stride different than 1, then I would go for the explicitly indexed-loop, otherwise I'd go for the range-for loop.
For generic algorithms on generic containers I'd go for the explicit iterator loop unless the code contained no flow control inside the loop and needed stride 1, in which case I'd go for the STL for_each + a lambda.
With a vector iterators do no offer any real advantage. The syntax is uglier, longer to type and harder to read.
Iterating over a vector using iterators is not faster and is not safer (actually if the vector is possibly resized during the iteration using iterators will put you in big troubles).
The idea of having a generic loop that works when you will change later the container type is also mostly nonsense in real cases. Unfortunately the dark side of a strictly typed language without serious typing inference (a bit better now with C++11, however) is that you need to say what is the type of everything at each step. If you change your mind later you will still need to go around and change everything. Moreover different containers have very different trade-offs and changing container type is not something that happens that often.
The only case in which iteration should be kept if possible generic is when writing template code, but that (I hope for you) is not the most frequent case.
The only problem present in your explicit index loop is that size returns an unsigned value (a design bug of C++) and comparison between signed and unsigned is dangerous and surprising, so better avoided. If you use a decent compiler with warnings enabled there should be a diagnostic on that.
Note that the solution is not to use an unsiged as the index, because arithmetic between unsigned values is also apparently illogical (it's modulo arithmetic, and x-1 may be bigger than x). You instead should cast the size to an integer before using it.
It may make some sense to use unsigned sizes and indexes (paying a LOT of attention to every expression you write) only if you're working on a 16 bit C++ implementation (16 bit was the reason for having unsigned values in sizes).
As a typical mistake that unsigned size may introduce consider:
void drawPolyline(const std::vector<P2d>& points)
{
for (int i=0; i<points.size()-1; i++)
drawLine(points[i], points[i+1]);
}
Here the bug is present because if you pass an empty points vector the value points.size()-1 will be a huge positive number, making you looping into a segfault.
A working solution could be
for (int i=1; i<points.size(); i++)
drawLine(points[i - 1], points[i]);
but I personally prefer to always remove unsinged-ness with int(v.size()).
PS: If you really don't want to think by to yourself to the implications and simply want an expert to tell you then consider that a quite a few world recognized C++ experts agree and expressed opinions on that unsigned values are a bad idea except for bit manipulations.
Discovering the ugliness of using iterators in the case of iterating up to second-last is left as an exercise for the reader.
Iterators make your code more generic.
Every standard library container provides an iterator hence if you change your container class in future the loop wont be affected.
Iterators are first choice over operator[]. C++11 provides std::begin(), std::end() functions.
As your code uses just std::vector, I can't say there is much difference in both codes, however, operator [] may not operate as you intend to. For example if you use map, operator[] will insert an element if not found.
Also, by using iterator your code becomes more portable between containers. You can switch containers from std::vector to std::list or other container freely without changing much if you use iterator such rule doesn't apply to operator[].
It always depends on what you need.
You should use operator[] when you need direct access to elements in the vector (when you need to index a specific element in the vector). There is nothing wrong in using it over iterators. However, you must decide for yourself which (operator[] or iterators) suits best your needs.
Using iterators would enable you to switch to other container types without much change in your code. In other words, using iterators would make your code more generic, and does not depend on a particular type of container.
By writing your client code in terms of iterators you abstract away the container completely.
Consider this code:
class ExpressionParser // some generic arbitrary expression parser
{
public:
template<typename It>
void parse(It begin, const It end)
{
using namespace std;
using namespace std::placeholders;
for_each(begin, end,
bind(&ExpressionParser::process_next, this, _1);
}
// process next char in a stream (defined elsewhere)
void process_next(char c);
};
client code:
ExpressionParser p;
std::string expression("SUM(A) FOR A in [1, 2, 3, 4]");
p.parse(expression.begin(), expression.end());
std::istringstream file("expression.txt");
p.parse(std::istringstream<char>(file), std::istringstream<char>());
char expr[] = "[12a^2 + 13a - 5] with a=108";
p.parse(std::begin(expr), std::end(expr));
Edit: Consider your original code example, implemented with :
using namespace std;
vector<int> myIntVector;
// Add some elements to myIntVector
myIntVector.push_back(1);
myIntVector.push_back(4);
myIntVector.push_back(8);
copy(myIntVector.begin(), myIntVector.end(),
std::ostream_iterator<int>(cout, " "));
The nice thing about iterator is that later on if you wanted to switch your vector to a another STD container. Then the forloop will still work.
its a matter of speed. using the iterator accesses the elements faster. a similar question was answered here:
What's faster, iterating an STL vector with vector::iterator or with at()?
Edit:
speed of access varies with each cpu and compiler

insert in unordered_set a new element: should the hint be end()?

If I am sure that a certain value is not yet into an unordered_set, and I am going to insert such value, is it correct to pass this set end() iterator as a hint?
EDIT:
Code:
#include <unordered_set>
using namespace std;
unordered_set<int> someset;
int main(){
auto it=someset.find(0);
if(it==someset.end()) someset.insert(it, 0); //correct? possible performance boost if the set is actually populated?
}
I think, you can simply call insert function and the returned value will tell you whether value is inserted, or it is already present in the set.
auto p = someset.insert(value);
if (!p.second)
{
std::cout << "value was already present in the set" << std::endl;
}
Actually p is of type std::pair<iterator,bool>, so p.second tells you whether value is inserted, or it is already present in the set, and p.first is the iterator which tells you the position of the value.
Remember that this is faster than your approach, because my solution reduces the overall work.
I assume you are referring to iterator insert ( const_iterator hint, value_type&& val ); which is a member of C++11's unordered_set. As described here the hint is used for performance optimization when inserting new elements. The insertion / position of the new element is based on hashes. So if you know how the hash is generated for your value_type you can pre-generate it and give a hint to the container.
However, the compiler may decide not to use it. So my hypothesis is: using end() may be used but it may not have any effect.
I believe the hint is of no use to unordered_set, the same way it is of no use to unordered_map. These methods exist to maintain these containers' interface compatible with their ordered versions (set and map, respectively).
Read more here: std::unordered_map insert with hint

Memory Allocation in C++, Using a Map of Linked Lists

The underlying data structure I am using is:
map<int, Cell> struct Cell{ char c; Cell*next; };
In effect the data structure maps an int to a linked list. The map(in this case implemented as a hashmap) ensures that finding a value in the list runs in constant time. The Linked List ensures that insertion and deletion also run in constant time. At each processing iteration I am doing something like:
Cell *cellPointer1 = new Cell;
//Process cells, build linked list
Once the list is built I put the elements Cell in map. The structure was working just fine and after my program I deallocate memory. For each Cell in the list.
delete cellPointer1
But at the end of my program I have a memory leak!!
To test memory leak I use:
#include <stdlib.h>
#include <crtdbg.h>
#define _CRTDBG_MAP_ALLOC
_CrtDumpMemoryLeaks();
I'm thinking that somewhere along the way the fact that I am putting the Cells in the map does not allow me to deallocate the memory correctly. Does anyone have any ideas on how to solve this problem?
We'll need to see your code for insertion and deletion to be sure about it.
What I'd see as a memleak-free insert / remove code would be:
( NOTE: I'm assuming you don't store the Cells that you allocate in the map )
//
// insert
//
std::map<int, Cell> _map;
Cell a; // no new here!
Cell *iter = &a;
while( condition )
{
Cell *b = new Cell();
iter->next = b;
iter = b;
}
_map[id] = a; // will 'copy' a into the container slot of the map
//
// cleanup:
//
std::map<int,Cell>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell &a = i->second;
Cell *iter = a.next; // list of cells associated to 'a'.
while( iter != NULL )
{
Cell *to_delete = iter;
iter = iter->next;
delete to_delete;
}
_map.erase(i); // will remove the Cell from the map. No need to 'delete'
i++;
}
Edit: there was a comment indicating that I might not have understood the problem completely. If you insert ALL the cells you allocate in the map, then the faulty thing is that your map contains Cell, not Cell*.
If you define your map as: std::map<int, Cell *>, your problem would be solved at 2 conditions:
you insert all the Cells that you allocate in the map
the integer (the key) associated to each cell is unique (important!!)
Now the deletion is simply a matter of:
std::map<int, Cell*>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell *c = i->second;
if ( c != NULL ) delete c;
}
_map.clear();
I've built almost the exact same hybrid data structure you are after (list/map with the same algorithmic complexity if I were to use unordered_map instead) and have been using it from time to time for almost a decade though it's a kind of bulky structure (something I'd use with convenience in mind more than efficiency).
It's worth noting that this is quite different from just using std::unordered_map directly. For a start, it preserves the original order in which one inserts elements. Insertion, removal, and searches are guaranteed to happen in logarithmic time (or constant time depending on whether key searching is involved and whether you use a hash table or BST), iterators do not get invalidated on insertion/removal (the main requirement I needed which made me favor std::map over std::unordered_map), etc.
The way I did it was like this:
// I use this as the iterator for my container with
// the list being the main 'focal point' while I
// treat the map as a secondary structure to accelerate
// key searches.
typedef typename std::list<Value>::iterator iterator;
// Values are stored in the list.
std::list<Value> data;
// Keys and iterators into the list are stored in a map.
std::map<Key, iterator> accelerator;
If you do it like this, it becomes quite easy. push_back is a matter of pushing back to the list and adding the last iterator to the map, iterator removal is a matter of removing the key pointed to by the iterator from the map before removing the element from the list as the list iterator, finding a key is a matter of searching the map and returning the associated value in the map which happens to be the list iterator, key removal is just finding a key and then doing iterator removal, etc.
If you want to improve all methods to constant time, then you can use std::unordered_map instead of std::map as I did here (though that comes with some caveats).
Taking an approach like this should simplify things considerably over an intrusive list-based solution where you're manually having to free memory.
Is there a reason why you are not using built-in containers like, say, STL?
Anyhow, you don't show the code where the allocation takes place, nor the map definition (is this coming from a library?).
Are you sure you deallocate all of the previously allocated Cells, starting from the last one and going backwards up to the first?
You could do this using the STL (remove next from Cell):
std::unordered_map<int,std::list<Cell>>
Or if cell only contains a char
std::unordered_map<int,std::string>
If your compiler doesn't support std::unordered_map then try boost::unordered_map.
If you really want to use intrusive data structures, have a look at Boost Intrusive.
As others have pointed out, it may be hard to see what you're doing wrong without seeing your code.
Someone should mention, however, that you're not helping yourself by overlaying two container types here.
If you're using a hash_map, you already have constant insertion and deletion time, see the related Hash : How does it work internally? post. The only exception to the O(c) lookup time is if your implementation decides to resize the container, in which case you have added overhead regardless of your linked list addition. Having two addressing schemes is only going to make things slower (not to mention buggier).
Sorry if this doesn't point you to the memory leak, but I'm sure a lot of memory leaks / bugs come from not using stl / boost containers to their full potential. Look into that first.
You need to be very careful with what you are doing, because values in a C++ map need to be copyable and with your structure that has raw pointers, you must handle your copy semantics properly.
You would be far better off using std::list where you won't need to worry about your copy semantics.
If you can't change that then at least std::map<int, Cell*> will be a bit more manageable, although you would have to manage the pointers in your map because std::map will not manage them for you.
You could of course use std::map<int, shared_ptr<Cell> >, probably easiest for you for now.
If you also use shared_ptr within your Cell object itself, you will need to beware of circular references, and as Cell will know it's being shared_ptr'd you could derive it from enable_shared_from_this
My final point will be that list is very rarely the correct collection type to use. It is the correct one to use sometimes, especially when you have an LRU cache situation and you want to move accessed elements to the end of the list fast. However that is the minority case and it probably doesn't apply here. Think of an alternative collection you really want. map< int, set<char> > perhaps? or map< int, vector< char > > ?
Your list has a lot of overheads to store a few chars

Does the C++ standard library have a set ordered by insertion order?

Does the C++ standard library have an "ordered set" datastructure? By ordered set, I mean something that is exactly the same as the ordinary std::set but that remembers the order in which you added the items to it.
If not, what is the best way to simulate one? I know you could do something like have a set of pairs with each pair storing the number it was added in and the actual value, but I dont want to jump through hoops if there is a simpler solution.
No single, homogeneous data structure will have this property, since it is either sequential (i.e. elements are arranged in insertion order) or associative (elements are arranged in some order depending on value).
The best, clean approach would perhaps be something like Boost.MultiIndex, which allows you to add multiple indexes, or "views", on a container, so you can have a sequential and an ordered index.
Instead of making a std::set of whatever type you're using, why not pass it a std::pair of the object and an index that gets incremented at each insertion?
No, it does not.
Such a container presumably would need two different iterators, one to iterate in the order defined by the order of adding, and another to iterate in the usual set order. There's nothing of that kind in the standard libraries.
One option to simulate it is to have a set of some type that contains an intrusive linked list node in addition to the actual data you care about. After adding an element to the set, append it to the linked list. Before removing an element from the set, remove it from the linked list. This is guaranteed to be OK, since pointers to set elements aren't invalidated by any operation other than removing that element.
I thought the answer is fairly simple, combine set with another iteratable structure (say, queue). If you like to iterate the set in the order that the element been inserted, push the elements in queue first, do your work on the front element, then pop out, put into set.
[Disclaimer: I have given a similar answer to this question already]
If you can use Boost, a very straightforward solution is to use the header-only library Boost.Bimap (bidirectional maps).
Consider the following sample program that will display some dummy entries in insertion order (try out here):
#include <iostream>
#include <string>
#include <type_traits>
#include <boost/bimap.hpp>
using namespace std::string_literals;
template <typename T>
void insertByOrder(boost::bimap<T, size_t>& mymap, const T& element) {
using pos = typename std::remove_reference<decltype(mymap)>::type::value_type;
// We use size() as index, therefore indexing the elements with 0, 1, ...
mymap.insert(pos(element, mymap.size()));
}
int main() {
boost::bimap<std::string, size_t> mymap;
insertByOrder(mymap, "stack"s);
insertByOrder(mymap, "overflow"s);
// Iterate over right map view (integers) in sorted order
for (const auto& rit : mymap.right) {
std::cout << rit.first << " -> " << rit.second << std::endl;
}
}
The funky type alias in insertByOrder() is needed to insert elements into a boost::bimap in the following line (see referenced documentation).
Yes, it's called a vector or list (or array). Just appends to the vector to add element to the set.

Remove elements from list in a loop without break

In the code of my game, I want to remove some elements from a list,
which happens in a loop.The only problem I have is, when I use
list::erase I have to break after that function because I think
the list becomes "outdated". This causes a little flicker and I want to
try to remove it.
The current code is this:
for(list<Arrow*>::iterator it = arrows.begin(); it != arrows.end(); it++)
{
Arrow* a = (*it);
if(a->isActive() == true)
{
a->update();
}
else
{
arrows.erase(it);
break;
}
}
Thank you in advance!
Edit:
Sorry, I was confused with vector and list. Got the answer, thanks!
you should do:
it = arrows.erase(it);
//
list<Arrow*>::iterator it = arrows.begin();
while (it != arrows.end())
{
Arrow* a = (*it);
if(a->isActive())
{
a->update(); ++it;
}
else{ // delete (a); ???
it=arrows.erase(it);}
}
I am confused, you say vector and in your example you are using a list.
List is implemented with a double linked-list (actually it is likely to be implemented, because the standard fix just the complexity and not the details). The iterator after erasing are still valid. http://www.cplusplus.com/reference/stl/list/erase/
Erasing in the middle with vector is slow and also invalided all the iterators and references.
Like everyone said you example is confusing. If you want delete in the mid of your container and if you are using
1>vector:
Then every iterator and reference after the point of erase is invalidated.
Also vector deleting from mid of vector will causing element behind it to shift which might considered slow if you want performance.
2>list:
Then only the deleted iterator and reference is invalidated.
Either way using the erase-remove idiom is preferred when you want to delete some element in the middle of stl sequence container.
The proper way to filter out items in a standard library container is to use the Erase-Remove Idiom. Because you're using a member function as the test in the loop, you should adapt the example to use std::remove_if().
You can implement this in a short and sweet (provided you like functional programming) way:
#include <algorithm>
#include <functional>
arrows.erase(std::remove_if(
arrows.begin(), arrows.end(), std::mem_fun(&Arrow::isActive)
));
This will work for std::vector<>, std::deque<>, std::list<> etc. regardless of implementation and iterator invalidation semantics.
Edit: I see you're also using the Arrow::update() method inside the loop. You can either do a double pass on the list, use a function object to call both methods or write the loop manually.
In the last case, you can use the it = arrows.erase(it); trick, but this will only be efficient for std::list<>. The loop will have O(n^2) complexity for the
std::vector<> and std::deque<> containers.