Related
I've been fighting for several hours to understand why begin() of vector within a map doesn't return the same address if I point to the vector.begin() itself or the second.begin() member of the map.
Let me explain:
I have a class containing a map which key is an int and its content a vector of int. I have to go through the map and keep in memory the position where I was 'just before', so I also have another map of iterators (of the first map).
So, I would like to get something like:
map1 : <2,<3,8,1,3,7,1>>
map2 : <8,<6,9,1,3>>
map3 : <1,<3,1>>
etc.
To make it simpler, in my code, the T_IPC_CommandId is just an enum of int.
I have a method called Add(int TopCommand, int Command) which fills/creates the map of vectors of commands. For example, to create map3, I will code:
Add(1,3), Add(1,1)
I have a method called GetNext(int Command) which returns the next int, in the map pointed by Command. For example in map2, calling Get(8) will return 6, and the next call will return 9, then 1, 3 and 0 for the next calls.
In order to know which int I will return, I need to keep track, for each map, what is the next int to return. So, I use a IPC_CommandId_Pointer which is a map which key is the command itself and holds the iterator where I am in the map. And here is the problem:
At each Add() call, I initialise this IPC_CommandId_Pointer to the beginning of main map. Like this:
void T_ListOfCommand::Add(T_IPC_CommandId Top_CommandId, T_IPC_CommandId IPC_CommandId)
{
T_IPC_CommandId_Vec_Iter Vec_Iter;
T_IPC_CommandId_Vec Vec;
if ((IPC_CommandId_Map.find(Top_CommandId)) == IPC_CommandId_Map.end())
{
IPC_CommandId_Map[Top_CommandId].clear();
}
IPC_CommandId_Map[Top_CommandId].push_back(IPC_CommandId);
// Repeated at each add, but don't care ...
Vec_Iter = IPC_CommandId_Map[Top_CommandId].begin();
IPC_CommandId_Pointer[Top_CommandId] = Vec_Iter;
}
The problem I have is that, at each IPC_CommandId_Map[Top_CommandId].push_back(IPC_CommandId), the IPC_CommandId_Map[Top_CommandId].begin() doesn't return the same address.
Now, when replacing the:
Vec_Iter = IPC_CommandId_Map[Top_CommandId].begin();
IPC_CommandId_Pointer[Top_CommandId] = Vec_Iter;
With:
Vec = IPC_CommandId_Map[Top_CommandId];
IPC_CommandId_Pointer[Top_CommandId] = Vec.begin();
This works fine.
I'm supposing to point to the same location when using the begin() in the map, or when dereferencing the vector, isn't it? Well, it seems not.
If someone could explain me the difference, that would be great.
vector::begin() is not guaranteed to return the same address each time!
If std::vector::push_back() causes reallocation of the vectors data (since its data has to be continuous), the iterator (address) which begin() returns will be different.
And (as The Paramagnetic Croissant stated in the comment) the code Vec = IPC_CommandId_Map[Top_CommandId]; copies the element of the map to the variable Vec.
(Vocabulary check: you're not dereferencing anything, particularly not a vector.)
Vec_Iter = IPC_CommandId_Map[Top_CommandId].begin();
is an iterator into the vector in the map.
It becomes invalid when the vector is reallocated.
Every time the vector is reallocated you will get a different value for begin().
It's a bad idea to store this iterator across calls to push_back or anything else that may invalidate an iterator.
Vec = IPC_CommandId_Map[Top_CommandId];
IPC_CommandId_Pointer[Top_CommandId] = Vec.begin();
is an iterator into the local copy Vec.
It becomes invalid as soon as the function returns.
It's a bad idea to store this iterator anywhere beyond Vec's lifetime.
Overall, iterators should be considered transient and only be used in as small a scope as possible.
A much more robust solution is to store a "current index" for the vector instead of an iterator.
Hello all :) Today I am refining my skills on graph theory and data structures. I decided to do a small project in C++ because it's been a while since I've worked in C++.
I want to make an adjacency list for a directed graph. In other words, something which looks like:
0-->1-->3
1-->2
2-->4
3-->
4-->
This would be a directed graph with V0 (vertex 0) having an edge to V1 and V3, V1 having an edge to V2, and V2 having an edge to V4, like this:
V0----->V1---->V2---->V4
|
|
v
V3
I know that in order to do this, I will need to create an adjacency list in C++. An adjacency list is basically an array of linked lists. Okay, let's see some pseudo C++ code:
#include <stdio>
#include <iostream>
using namespace std;
struct graph{
//The graph is essentially an array of the adjList struct.
node* List[];
};
struct adjList{
//A simple linked list which can contain an int at each node in the list.
};
struct node {
int vertex;
node* next;
};
int main() {
//insert cool graph theory sorting algorithm here
}
As you can tell, this pseudocode is currently far from the mark. And that is what i wanted some help with -- pointers and structs in C++ have never been my strong suit. First of all, this takes care of the vertices that a vertex points to -- but what about the vertex itself? How can I keep track of that vertex? When I loop over the array, it will do me no good to only know what vertices are being pointed to, rather than also knowing what points to them. The first element in each list should probably be that vertex, and then the elements after that are the vertices it points to. But then, how can I access this first element of the list in my main program? (sorry if this is convoluted or confusing, I would happy to rephrase).
I would like to be able to loop over this adjacency list to do some cool things with graphs. For example, to implement some graph theory algorithms (sorts, shortest paths, etc) using the adjacency list representation.
(Also, I had a question about the adjacency list. What is different than just using a list of arrays? Why can't I just have a list with an array at each element in the list?)
You may use a vector in node, as a adjacency list.
class node {
int value;
vector<node*> neighbors;
};
If the graph is known at compile time, you can use array, but it's "a little bit" harder. If you know just size of graph (at compile time) you can do something like that.
template<unsigned int N>
class graph {
array<node, N> nodes;
}
To add a neighbor, you doing something like that (do not forget numbering from zero):
nodes[i].neighbors.push_back(nodes+j); //or &nodes[j]
Of course, you can do no-pointer adjacency list and work "above" a table. Than you have vector<int> in node and you pushing number of neighbour. With both representation of the graph, you can realize all algorithms which use adjacency list.
And finally, I might add. Some use a list instead of a vector, because the removal is in O(1) time. Mistake. For most algorithms, the order in the adjacency list is not important. So you can erase any element from vector in O(1) time. Just swap it with last element, pop_back is O(1) complexity. Something like that:
if(i != number_of_last_element_in_list) //neighbors.size() - 1
swap(neighbors[i], neighbor.back());
neighbors.pop_back();
Specific example (you have vector in node, C++11 (!)):
//creation of nodes, as previously
constexpr unsigned int N = 3;
array<node,N> nodes; //or array<node, 3> nodes;
//creating edge (adding neighbors), in the constructor, or somewhere
nodes[0].neighbors = {&nodes[1]};
nodes[1].neighbors = {&nodes[0], &nodes[1]};
//adding runtime, i,j from user, eg. i = 2, j = 0
nodes[i].neighbors.push_back(&nodes[j]); //nodes[2].neighbors = {&nodes[0]};
I believe it's clear. From 0 you can go to 1, from 1 to 0 and to itself, and (as in eg.) from 2 to 0. It's directed graph. If you want undirected, you should add to both nodes neighbour’s addresses. You can use numbers instead of pointers. vector<unsigned int> in class node and pushing back numbers, no addresses.
As we know, you do not need to use pointers. Here is an example of it, too.
When the number of vertexes may change, you can use vector of nodes (vector<node>) instead array, and just resizing. The rest remains unchanged. For example:
vector<node> nodes(n); //you have n nodes
nodes.emplace_back(); //you added new node, or .resize(n+1)
//here is place to runtime graph generate
//as previously, i,j from user, but now you have 'vector<unsigned int>' in node
nodes[i].neighbors.push_back(j);
But you can't erase a node, this breaches numbering! If you want to erase something, you should use list (list<node*>) of pointers. Otherwise you must keep non-existent vertexes. Here, the order matters!
Regarding the line nodes.emplace_back(); //adding node, It is safe with graph without pointers. If you want use pointers, you predominately shouldn't change size of graph.
You can accidentally change address of some nodes, while adding vertex, when vector will be copied to new place (out of space).
One way to deal with it is using reserve, although you have to know maximal size of graph! But in fact I encourage you not to use vector to keep vertexes, when you are using pointers. If you don't know implementation, more safe could be self memory management (smart pointers eg. shared_ptr or just new).
node* const graph = new node[size]; //<-- It is your graph.
//Here no address change accidentally.
Using vector as adjacency list is always fine! There's no chance to change node's address.
This may not be very general approach but thats how I handle adjacency list in most of the cases. C++ has STL library which supports a data structure for linked list named as list.
Say you have N nodes in the graph, create a linked list for every node.
list graph[N];
Now graph[i] represent the neighbours of node i. For every edge i to j, do
graph[i].push_back(j);
The best comfort is no handling of pointers so as segmentation fault errors.
For more reference http://www.cplusplus.com/reference/list/list/
I suggest you adding in the node structure, the Adjacency List
And define the graph structure as List of Nodes instead of List of Adjacency Lists :)
struct node {
int vertex;
node* next;
adjList m_neighbors;
};
struct graph{
//List of nodes
};
I would recommend the more general and simple approach of using vector and pairs:
#include
#include
typedef std::pair<int, int> ii; /* the first int is for the data, and the second is for the weight of the Edge - Mostly usable for Dijkstra */
typedef std::vector<ii> vii;
typedef std::vector <vii> WeightedAdjList; /* Usable for Dijkstra -for example */
typedef std::vector<vi> AdjList; /*use this one for DFS/BFS */
Or alias style (>=C++11):
using ii = std::pair<int,int>;
using vii = std::vector<ii>;
using vi = std::vector<int>;
using WeightedAdjList = std::vector<vii>;
using AdjList = std::vector<vi>;
From here:
using vector and pairs (from tejas's answer)
For additional information you can refer to a very good summary of topcoder:
Power up c++ with STL
My approach would be to use a hash map to store the list of nodes in the graph
class Graph {
private:
unordered_map<uint64_t, Node> nodeList;
...
}
The map takes the node ID as key, and the node itself as value. This way you could search for a node in the graph in constant time.
The node contains the adjacency list, in this case as a c++11 vector. It could also be a linked list, although for this use case I would not see a difference in efficiency. Maybe the list would be better if you would like to keep it sorted somehow.
class Node{
uint64_t id; // Node ID
vector<uint64_t> adjList;
...
}
With this approach you have to go through the adjacency list and then search the map on the ID to get the node.
As an alternative, you could have a vector of pointers to the neighbor nodes itself. That would give you a direct access to the neighbor nodes, but then you could not use a map to keep all your nodes in the graph, and you would loose the possibility to search entries easily in your graph.
As you can see, there is a lot of trade-off decisions you have to make when implementing a graph, all depends on your use cases.
I have a vector of pointers-to-structs.
std::vector<REVOCATION_LIST_BLOCK*> rl_block;
The struct is defined in a header...
struct REVOCATION_LIST_BLOCK
{
int number_of_entries;
REVOCATION_LIST_ENTRY *entries;
SIGDATA signature;
};
Later, the vector has one entry added, like this:
REVOCATION_LIST_BLOCK *new_rl_block = new REVOCATION_LIST_BLOCK;
new_rl_block->number_of_entries = 6;
rl_block.push_back(new_rl_block);
Later, I want to access the data within rl_block:
std::vector<REVOCATION_LIST_BLOCK*>::iterator it;
it = rl_block.begin();
number = (*it)->number_of_entries;
But this gets me the 'ol assertion:
vector iterator not dereferencable
Note, however, that I am able to do this, and get the value '6' back, as expected:
REVOCATION_LIST_BLOCK *block = rl_block[0];
number = block->number_of_entries_in_this_signature_block;
What am I missing with the iterator way?
EDIT
Thanks folks for your comments. The rl_block is part of another struct (this code uses a complicated set of nested structs). If I write a bit of test code that does not have the rl_block inside of another struct, everything works as expected.
As a result, I modified the rl_block definition to:
std::vector<REVOCATION_LIST_BLOCK*> *rl_block;
and then allocate the vector itself dynamically. This works. But why?
Digging further, I found the issue --- a lousy memset() of the containing structure to zero-out the memory. This works for P.O.D., but not for the vectors, of course.
The underlying data structure I am using is:
map<int, Cell> struct Cell{ char c; Cell*next; };
In effect the data structure maps an int to a linked list. The map(in this case implemented as a hashmap) ensures that finding a value in the list runs in constant time. The Linked List ensures that insertion and deletion also run in constant time. At each processing iteration I am doing something like:
Cell *cellPointer1 = new Cell;
//Process cells, build linked list
Once the list is built I put the elements Cell in map. The structure was working just fine and after my program I deallocate memory. For each Cell in the list.
delete cellPointer1
But at the end of my program I have a memory leak!!
To test memory leak I use:
#include <stdlib.h>
#include <crtdbg.h>
#define _CRTDBG_MAP_ALLOC
_CrtDumpMemoryLeaks();
I'm thinking that somewhere along the way the fact that I am putting the Cells in the map does not allow me to deallocate the memory correctly. Does anyone have any ideas on how to solve this problem?
We'll need to see your code for insertion and deletion to be sure about it.
What I'd see as a memleak-free insert / remove code would be:
( NOTE: I'm assuming you don't store the Cells that you allocate in the map )
//
// insert
//
std::map<int, Cell> _map;
Cell a; // no new here!
Cell *iter = &a;
while( condition )
{
Cell *b = new Cell();
iter->next = b;
iter = b;
}
_map[id] = a; // will 'copy' a into the container slot of the map
//
// cleanup:
//
std::map<int,Cell>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell &a = i->second;
Cell *iter = a.next; // list of cells associated to 'a'.
while( iter != NULL )
{
Cell *to_delete = iter;
iter = iter->next;
delete to_delete;
}
_map.erase(i); // will remove the Cell from the map. No need to 'delete'
i++;
}
Edit: there was a comment indicating that I might not have understood the problem completely. If you insert ALL the cells you allocate in the map, then the faulty thing is that your map contains Cell, not Cell*.
If you define your map as: std::map<int, Cell *>, your problem would be solved at 2 conditions:
you insert all the Cells that you allocate in the map
the integer (the key) associated to each cell is unique (important!!)
Now the deletion is simply a matter of:
std::map<int, Cell*>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell *c = i->second;
if ( c != NULL ) delete c;
}
_map.clear();
I've built almost the exact same hybrid data structure you are after (list/map with the same algorithmic complexity if I were to use unordered_map instead) and have been using it from time to time for almost a decade though it's a kind of bulky structure (something I'd use with convenience in mind more than efficiency).
It's worth noting that this is quite different from just using std::unordered_map directly. For a start, it preserves the original order in which one inserts elements. Insertion, removal, and searches are guaranteed to happen in logarithmic time (or constant time depending on whether key searching is involved and whether you use a hash table or BST), iterators do not get invalidated on insertion/removal (the main requirement I needed which made me favor std::map over std::unordered_map), etc.
The way I did it was like this:
// I use this as the iterator for my container with
// the list being the main 'focal point' while I
// treat the map as a secondary structure to accelerate
// key searches.
typedef typename std::list<Value>::iterator iterator;
// Values are stored in the list.
std::list<Value> data;
// Keys and iterators into the list are stored in a map.
std::map<Key, iterator> accelerator;
If you do it like this, it becomes quite easy. push_back is a matter of pushing back to the list and adding the last iterator to the map, iterator removal is a matter of removing the key pointed to by the iterator from the map before removing the element from the list as the list iterator, finding a key is a matter of searching the map and returning the associated value in the map which happens to be the list iterator, key removal is just finding a key and then doing iterator removal, etc.
If you want to improve all methods to constant time, then you can use std::unordered_map instead of std::map as I did here (though that comes with some caveats).
Taking an approach like this should simplify things considerably over an intrusive list-based solution where you're manually having to free memory.
Is there a reason why you are not using built-in containers like, say, STL?
Anyhow, you don't show the code where the allocation takes place, nor the map definition (is this coming from a library?).
Are you sure you deallocate all of the previously allocated Cells, starting from the last one and going backwards up to the first?
You could do this using the STL (remove next from Cell):
std::unordered_map<int,std::list<Cell>>
Or if cell only contains a char
std::unordered_map<int,std::string>
If your compiler doesn't support std::unordered_map then try boost::unordered_map.
If you really want to use intrusive data structures, have a look at Boost Intrusive.
As others have pointed out, it may be hard to see what you're doing wrong without seeing your code.
Someone should mention, however, that you're not helping yourself by overlaying two container types here.
If you're using a hash_map, you already have constant insertion and deletion time, see the related Hash : How does it work internally? post. The only exception to the O(c) lookup time is if your implementation decides to resize the container, in which case you have added overhead regardless of your linked list addition. Having two addressing schemes is only going to make things slower (not to mention buggier).
Sorry if this doesn't point you to the memory leak, but I'm sure a lot of memory leaks / bugs come from not using stl / boost containers to their full potential. Look into that first.
You need to be very careful with what you are doing, because values in a C++ map need to be copyable and with your structure that has raw pointers, you must handle your copy semantics properly.
You would be far better off using std::list where you won't need to worry about your copy semantics.
If you can't change that then at least std::map<int, Cell*> will be a bit more manageable, although you would have to manage the pointers in your map because std::map will not manage them for you.
You could of course use std::map<int, shared_ptr<Cell> >, probably easiest for you for now.
If you also use shared_ptr within your Cell object itself, you will need to beware of circular references, and as Cell will know it's being shared_ptr'd you could derive it from enable_shared_from_this
My final point will be that list is very rarely the correct collection type to use. It is the correct one to use sometimes, especially when you have an LRU cache situation and you want to move accessed elements to the end of the list fast. However that is the minority case and it probably doesn't apply here. Think of an alternative collection you really want. map< int, set<char> > perhaps? or map< int, vector< char > > ?
Your list has a lot of overheads to store a few chars
Assuming a map where you want to preserve existing entries. 20% of the time, the entry you are inserting is new data. Is there an advantage to doing std::map::find then std::map::insert using that returned iterator? Or is it quicker to attempt the insert and then act based on whether or not the iterator indicates the record was or was not inserted?
The answer is you do neither. Instead you want to do something suggested by Item 24 of Effective STL by Scott Meyers:
typedef map<int, int> MapType; // Your map type may vary, just change the typedef
MapType mymap;
// Add elements to map here
int k = 4; // assume we're searching for keys equal to 4
int v = 0; // assume we want the value 0 associated with the key of 4
MapType::iterator lb = mymap.lower_bound(k);
if(lb != mymap.end() && !(mymap.key_comp()(k, lb->first)))
{
// key already exists
// update lb->second if you care to
}
else
{
// the key does not exist in the map
// add it to the map
mymap.insert(lb, MapType::value_type(k, v)); // Use lb as a hint to insert,
// so it can avoid another lookup
}
The answer to this question also depends on how expensive it is to create the value type you're storing in the map:
typedef std::map <int, int> MapOfInts;
typedef std::pair <MapOfInts::iterator, bool> IResult;
void foo (MapOfInts & m, int k, int v) {
IResult ir = m.insert (std::make_pair (k, v));
if (ir.second) {
// insertion took place (ie. new entry)
}
else if ( replaceEntry ( ir.first->first ) ) {
ir.first->second = v;
}
}
For a value type such as an int, the above will more efficient than a find followed by an insert (in the absence of compiler optimizations). As stated above, this is because the search through the map only takes place once.
However, the call to insert requires that you already have the new "value" constructed:
class LargeDataType { /* ... */ };
typedef std::map <int, LargeDataType> MapOfLargeDataType;
typedef std::pair <MapOfLargeDataType::iterator, bool> IResult;
void foo (MapOfLargeDataType & m, int k) {
// This call is more expensive than a find through the map:
LargeDataType const & v = VeryExpensiveCall ( /* ... */ );
IResult ir = m.insert (std::make_pair (k, v));
if (ir.second) {
// insertion took place (ie. new entry)
}
else if ( replaceEntry ( ir.first->first ) ) {
ir.first->second = v;
}
}
In order to call 'insert' we are paying for the expensive call to construct our value type - and from what you said in the question you won't use this new value 20% of the time. In the above case, if changing the map value type is not an option then it is more efficient to first perform the 'find' to check if we need to construct the element.
Alternatively, the value type of the map can be changed to store handles to the data using your favourite smart pointer type. The call to insert uses a null pointer (very cheap to construct) and only if necessary is the new data type constructed.
There will be barely any difference in speed between the 2, find will return an iterator, insert does the same and will search the map anyway to determine if the entry already exists.
So.. its down to personal preference. I always try insert and then update if necessary, but some people don't like handling the pair that is returned.
I would think if you do a find then insert, the extra cost would be when you don't find the key and performing the insert after. It's sort of like looking through books in alphabetical order and not finding the book, then looking through the books again to see where to insert it. It boils down to how you will be handling the keys and if they are constantly changing. Now there is some flexibility in that if you don't find it, you can log, exception, do whatever you want...
If you are concerned about efficiency, you may want to check out hash_map<>.
Typically map<> is implemented as a binary tree. Depending on your needs, a hash_map may be more efficient.
I don't seem to have enough points to leave a comment, but the ticked answer seems to be long winded to me - when you consider that insert returns the iterator anyway, why go searching lower_bound, when you can just use the iterator returned. Strange.
Any answers about efficiency will depend on the exact implementation of your STL. The only way to know for sure is to benchmark it both ways. I'd guess that the difference is unlikely to be significant, so decide based on the style you prefer.
map[ key ] - let stl sort it out. That's communicating your intention most effectively.
Yeah, fair enough.
If you do a find and then an insert you're performing 2 x O(log N) when you get a miss as the find only lets you know if you need to insert not where the insert should go (lower_bound might help you there). Just a straight insert and then examining the result is the way that I'd go.