Suppose I own a list of edges saved inside a vector like:
typedef struct edge
{
int v;
size_t start;
size_t end;
}e;
typedef vector<list<e>> adj_list;
adj_list tree;
I have to do logic on this tree object, but the logic is too complicated to do it in place (constricted to not recurse). I need an extra data structure to handle each node. As a simple example, lets consider incrementing each edge's v value:
list<e> aux;
aux.insert(aux.begin(), tree[0].begin(), tree[0].end());
while (!aux.empty())
{
e& now = aux.front();
aux.pop_front();
now.v++;
aux.insert(aux.begin(), tree[now.v].begin(), tree[now.v].end());
}
The problem in doing this is that the changes made to the now variable does not reflect the value in tree. I need a list(can be any list(vector,linked,queue,stack) that has an empty() boolean like Dijkstra) ds to handle my edge objects in tree. Is there an elegant way to do this? Can I use a list of iterators? I'm specifically asking an "elegant" approach in hopes that it does not involve pointers.
As discussed in the comments, the solution is to store iterators instead of copies, e.g.:
list<list<e>::iterator> aux;
aux.insert(aux.begin(), tree[0].begin(), tree[0].end());
while (!aux.empty())
{
e& now = *(aux.front());
aux.pop_front();
now.v++;
aux.insert(aux.begin(), tree[now.v].begin(), tree[now.v].end());
}
This works only if you can guarantee that nothing will invalidate the stored iterators, such as certain operations on tree could do.
As pointed out by n. 'pronouns' m., iterators can be considered as "generalized pointers", so many problems that regular pointers have also apply to iterators.
Another (slightly safer) approach would be to store std::shared_ptrs in the inner list of tree - then you can simply store another std::shared_ptr to the same object in aux which makes sure that the object cannot be accidentally deleted while it is still being referenced
Related
This question is about owning pointers, consuming pointers, smart pointers, vectors, and allocators.
I am a little bit lost on my thoughts about code architecture. Furthermore, if this question has already an answer somewhere, 1. sorry, but I haven't found a satisfying answer so far and 2. please point me to it.
My problem is the following:
I have several "things" stored in a vector and several "consumers" of those "things". So, my first try was like follows:
std::vector<thing> i_am_the_owner_of_things;
thing* get_thing_for_consumer() {
// some thing-selection logic
return &i_am_the_owner_of_things[5]; // 5 is just an example
}
...
// somewhere else in the code:
class consumer {
consumer() {
m_thing = get_thing_for_consumer();
}
thing* m_thing;
};
In my application, this would be safe because the "things" outlive the "consumers" in any case. However, more "things" can be added during runtime and that can become a problem because if the std::vector<thing> i_am_the_owner_of_things; gets reallocated, all the thing* m_thing pointers become invalid.
A fix to this scenario would be to store unique pointers to "things" instead of "things" directly, i.e. like follows:
std::vector<std::unique_ptr<thing>> i_am_the_owner_of_things;
thing* get_thing_for_consumer() {
// some thing-selection logic
return i_am_the_owner_of_things[5].get(); // 5 is just an example
}
...
// somewhere else in the code:
class consumer {
consumer() {
m_thing = get_thing_for_consumer();
}
thing* m_thing;
};
The downside here is that memory coherency between "things" is lost. Can this memory coherency be re-established by using custom allocators somehow? I am thinking of something like an allocator which would always allocate memory for, e.g., 10 elements at a time and whenever required, adds more 10-elements-sized chunks of memory.
Example:
initially:
v = ☐☐☐☐☐☐☐☐☐☐
more elements:
v = ☐☐☐☐☐☐☐☐☐☐ 🡒 ☐☐☐☐☐☐☐☐☐☐
and again:
v = ☐☐☐☐☐☐☐☐☐☐ 🡒 ☐☐☐☐☐☐☐☐☐☐ 🡒 ☐☐☐☐☐☐☐☐☐☐
Using such an allocator, I wouldn't even have to use std::unique_ptrs of "things" because at std::vector's reallocation time, the memory addresses of the already existing elements would not change.
As alternative, I can only think of referencing the "thing" in "consumer" via a std::shared_ptr<thing> m_thing, as opposed to the current thing* m_thing but that seems like the worst approach to me, because a "thing" shall not own a "consumer" and with shared pointers I would create shared ownership.
So, is the allocator-approach a good one? And if so, how can it be done? Do I have to implement the allocator by myself or is there an existing one?
If you are able to treat thing as a value type, do so. It simplifies things, you don't need a smart pointer for circumventing the pointer/reference invalidation issue. The latter can be tackled differently:
If new thing instances are inserted via push_front and push_back during the program, use std::deque instead of std::vector. Then, no pointers or references to elements in this container are invalidated (iterators are invalidated, though - thanks to #odyss-jii for pointing that out). If you fear that you heavily rely on the performance benefit of the completely contiguous memory layout of std::vector: create a benchmark and profile.
If new thing instances are inserted in the middle of the container during the program, consider using std::list. No pointers/iterators/references are invalidated when inserting or removing container elements. Iteration over a std::list is much slower than a std::vector, but make sure this is an actual issue in your scenario before worrying too much about that.
There is no single right answer to this question, since it depends a lot on the exact access patterns and desired performance characteristics.
Having said that, here is my recommendation:
Continue storing the data contiguously as you are, but do not store aliasing pointers to that data. Instead, consider a safer alternative (this is a proven method) where you fetch the pointer based on an ID right before using it -- as a side-note, in a multi-threaded application you can lock attempts to resize the underlying store whilst such a weak reference lives.
So your consumer will store an ID, and will fetch a pointer to the data from the "store" on demand. This also gives you control over all "fetches", so that you can track them, implement safety measure, etc.
void consumer::foo() {
thing *t = m_thing_store.get(m_thing_id);
if (t) {
// do something with t
}
}
Or more advanced alternative to help with synchronization in multi-threaded scenario:
void consumer::foo() {
reference<thing> t = m_thing_store.get(m_thing_id);
if (!t.empty()) {
// do something with t
}
}
Where reference would be some thread-safe RAII "weak pointer".
There are multiple ways of implementing this. You can either use an open-addressing hash table and use the ID as a key; this will give you roughly O(1) access time if you balance it properly.
Another alternative (best-case O(1), worst-case O(N)) is to use a "reference" structure, with a 32-bit ID and a 32-bit index (so same size as 64-bit pointer) -- the index serves as a sort-of cache. When you fetch, you first try the index, if the element in the index has the expected ID you are done. Otherwise, you get a "cache miss" and you do a linear scan of the store to find the element based on ID, and then you store the last-known index value in your reference.
IMO best approach would be create new container which will behave is safe way.
Pros:
change will be done on separate level of abstraction
changes to old code will be minimal (just replace std::vector with new container).
it will be "clean code" way to do it
Cons:
it may look like there is a bit more work to do
Other answer proposes use of std::list which will do the job, but with larger number of allocation and slower random access. So IMO it is better to compose own container from couple of std::vectors.
So it may start look more or less like this (minimum example):
template<typename T>
class cluster_vector
{
public:
static const constexpr cluster_size = 16;
cluster_vector() {
clusters.reserve(1024);
add_cluster();
}
...
size_t size() const {
if (clusters.empty()) return 0;
return (clusters.size() - 1) * cluster_size + clusters.back().size();
}
T& operator[](size_t index) {
thowIfIndexToBig(index);
return clusters[index / cluster_size][index % cluster_size];
}
void push_back(T&& x) {
if_last_is_full_add_cluster();
clusters.back().push_back(std::forward<T>(x));
}
private:
void thowIfIndexToBig(size_t index) const {
if (index >= size()) {
throw std::out_of_range("cluster_vector out of range");
}
}
void add_cluster() {
clusters.push_back({});
clusters.back().reserve(cluster_size);
}
void if_last_is_full_add_cluster() {
if (clusters.back().size() == cluster_size) {
add_cluster();
}
}
private:
std::vector<std::vector<T>> clusters;
}
This way you will provide container which will not reallocate items. It doesn't meter what T does.
[A shared pointer] seems like the worst approach to me, because a "thing" shall not own a "consumer" and with shared pointers I would create shared ownership.
So what? Maybe the code is a little less self-documenting, but it will solve all your problems.
(And by the way you are muddling things by using the word "consumer", which in a traditional producer/consumer paradigm would take ownership.)
Also, returning a raw pointer in your current code is already entirely ambiguous as to ownership. In general, I'd say it's good practice to avoid raw pointers if you can (like you don't need to call delete.) I would return a reference if you go with unique_ptr
std::vector<std::unique_ptr<thing>> i_am_the_owner_of_things;
thing& get_thing_for_consumer() {
// some thing-selection logic
return *i_am_the_owner_of_things[5]; // 5 is just an example
}
Background
I wanted to manipulate the copy of a vector, however doing a vector copy operation on each of its element is normally expensive operation.
There are concept called shallow copy which I read somewhere is the default copy constructor behavior. However I'm not sure why it doesn't work or at least I tried to do the copy of vector object and the result looks like a deep copy.
struct Vertex{
int label;
Vertex(int label):label(label){ }
};
int main(){
vector<Vertex> vertices { Vertex(0), Vertex(1) };
// I Couldn't force this to be vector<Vertex*>
vector<Vertex> myvertices(vertices);
myvertices[1].label = 123;
std::cout << vertices[1].label << endl;
// OUTPUT: 1 (meaning object is deeply copied)
return 0;
}
Naive Solution: for pointer copy.
int main(){
vector<Vertex> vertices { Vertex(0), Vertex(1) };
vector<Vertex*> myvertices;
for (auto it = vertices.begin(); it != vertices.end(); ++it){
myvertices.push_back(&*it);
}
myvertices[1].label = 123;
std::cout << vertices[1].label << endl;
// OUTPUT: 123 (meaning object is not copied, just the pointer)
return 0;
}
Improvement
Is there any other better approach or std::vector API to construct a new vector containing just the pointer of each of the elements in the original vector?
One way you could transform a vector of elements to a vector of pointers that point to the elements of the original vector that is better in terms of efficiency compared to your example, due to the fact that it preallocates the buffer of the vector of pointers, and IMHO more elegant is via using std::transform as follows:
std::vector<Vertex*> myvertices(vertices.size());
std::transform(vertices.begin(), vertices.end(), myvertices.begin(), [](Vertex &v) { return &v; });
Live Demo
Or if you don't want to use a lambda for the unary operator:
std::vector<Vertex*> myvertices(vertices.size());
std::transform(vertices.begin(), vertices.end(), myvertices.begin(), std::addressof<Vertex>);
Live Demo
Caution: If you alter the original vector then you invalidate the pointers in the pointers' vector.
Thanks for #kfsone for noticing on the main problem that it is very uncommon people wanted to keep track of pointer from another vector of object without utilizing the core idea behind it. He provided an alternative approach that solve similar problem by using bit masking. It may not be obvious for me at first until he mentioned that.
When we are trying to store just the pointers of another vector, we are most probably wanted to do some tracking, house keeping (keeping track) of another object. Which later to be performed on the pointer itself without touching the original data. For my case, I'm solving a minimum vertex cover problem via bruteforce approach. Whereby I will need to generate all permutation of vertices (e.g. 20 vertices will generate 2**20=1million++ permutation), then I trim down all irrelevant permutation by slowly iterating each of the vertices in the vertex cover and remove edges that are covered by the vertices. In doing so, my first intuition is to copy all pointers to ensure efficiency and later i could just remove the pointer one by one.
However, another way of looking into this problem is not to use vector/set at all, but rather just keep track each of those pointer as a bit pattern. I won't go in the detail but feel free to learn from others.
The performance difference is very significant such that in bitwise, you can achieve O(1) constant time without much problem, whereas using a specific container, you tend to have to iterate each of the elements which bound your algorithm to O(n). To make it worst, if you are bruteforcing NP hard problem, you need to keep the constant factor as low as possible, and from O(1) to O(N) is a huge difference in such scenario.
#include <stdio.h>
Class XObject
{
int id;
char *type;
}
Class XSubObject : XObject
{
int remark;
char* place;
}
**Sorry for my bad example, but more or less data looks like this.
std::vector objects;
data stored in objects are like this:
#1=XObject(1001,"chair"), #2=XObject(1002,"spoon"), #3=XSubObject(1004,"table",2,"center"), #4=XSubObject(1005,"item",0,"left") and so..on
we cna have different XObjects with same types.
Class XStructure
{
XObject parent;
}
Class XStructureRow
{
XObject child;
XStructure parentStruct;
}
std::vector structures;
data stored in Structures are like this:
#5=XStructure(NULL), #7=XStructure(#1),#8=XStructure(#2),#9=XStructure(#3),#10=XStructure(#4) and so..on
std::vector structurerows;
data stored in Structures are like this:
XStructureRow(#4,#5), XStructureRow(#2,#1),XStructureRow(#2,#7),XStructureRow(#3,#10),XStructureRow(#4,#8) and so..on
How can i write a fast alogirthm that starts with XObject and finds it in which structurerow and fetching its structure and fetching its parent. For ex, I want to retrieve all the parents of Object with name "table"
and retrive its parents with name "chair".
My written algorithm is:
std::vector<XObject> getParents(XObject "chair")
{
std::vector<XObject> objs;
for (int i=0;i<structurerows.size() ;i++ )
{
XStructurerow sr=structurerows[i];
XStructutre parent= sr.fetchParent();
if(parent!=NULL)
{
if(parent.fetchName()=="chair")
objs.push_back(parent);
}
}
return objs;
}
if i have to fetch all the objects parents then it is taking too much time if i have huge data. I mean is there any solution that helps to find the parent objects at O(1) way instead iterating the complete loop? I want to fetch these parents with minimal iterations. Here the complexity is O(n) which i am no satisfied. I hope i made some valid points. Suggestions please..
A few suggestions.
First, your getParents() function is making multiple copies of objects and arrays. It constructs a new instance of vector called objs, fills it up with copies of items in the row. Then returns a copy of the array (which creates a new copy of each object, which creates copies of its members). That's likely the root cause of your performance problems.
Second your class hierarchy has classes with "child" and "parent" objects, but are storing copies of these XObject instances. So if you were to update one of these objects independently, all the parent and child objects you think are referring to them have a different copy. (And hence, will create some strange bugs later especially since the base classes contain pointers). Your object relationships in the class declarations should be via pointers, not instance copies.
Third, string comparisons during a lookup algorithm are also harsh on performance. You should represent your objects unique key by integers if at all possible.
Not knowing anything else about your problem set, if you addressed those three things, you'd likely have better performance and wouldn't care about finding the O(1) solution.
Now to actually answer your question:
I would keep a map or (hash_map) table of arrays to track all the objects of a certain type. That is:
std::map<std::string, std::vector<XObject*>> lookupmap;
Then as each object is created, you can look up it's type in "lookupmap" and add it:
void OnObjectCreated(XObject* pObj)
{
std::string strType(pObj->type);
lookupmap[strType].push_back(pObj);
}
I'll leave the part where you use std::map or std::hash_map as an exercise for you.
The only way to "find" something with O(1) complexity is to use a hash-table. The process of creating a hash-value from a key-value and then accessing the object indexed into the table by that hash-value will have O(1) complexity. Otherwise any other search algorithm will at best be O(log n) for a sorted list or sorted tree-type structure.
The underlying data structure I am using is:
map<int, Cell> struct Cell{ char c; Cell*next; };
In effect the data structure maps an int to a linked list. The map(in this case implemented as a hashmap) ensures that finding a value in the list runs in constant time. The Linked List ensures that insertion and deletion also run in constant time. At each processing iteration I am doing something like:
Cell *cellPointer1 = new Cell;
//Process cells, build linked list
Once the list is built I put the elements Cell in map. The structure was working just fine and after my program I deallocate memory. For each Cell in the list.
delete cellPointer1
But at the end of my program I have a memory leak!!
To test memory leak I use:
#include <stdlib.h>
#include <crtdbg.h>
#define _CRTDBG_MAP_ALLOC
_CrtDumpMemoryLeaks();
I'm thinking that somewhere along the way the fact that I am putting the Cells in the map does not allow me to deallocate the memory correctly. Does anyone have any ideas on how to solve this problem?
We'll need to see your code for insertion and deletion to be sure about it.
What I'd see as a memleak-free insert / remove code would be:
( NOTE: I'm assuming you don't store the Cells that you allocate in the map )
//
// insert
//
std::map<int, Cell> _map;
Cell a; // no new here!
Cell *iter = &a;
while( condition )
{
Cell *b = new Cell();
iter->next = b;
iter = b;
}
_map[id] = a; // will 'copy' a into the container slot of the map
//
// cleanup:
//
std::map<int,Cell>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell &a = i->second;
Cell *iter = a.next; // list of cells associated to 'a'.
while( iter != NULL )
{
Cell *to_delete = iter;
iter = iter->next;
delete to_delete;
}
_map.erase(i); // will remove the Cell from the map. No need to 'delete'
i++;
}
Edit: there was a comment indicating that I might not have understood the problem completely. If you insert ALL the cells you allocate in the map, then the faulty thing is that your map contains Cell, not Cell*.
If you define your map as: std::map<int, Cell *>, your problem would be solved at 2 conditions:
you insert all the Cells that you allocate in the map
the integer (the key) associated to each cell is unique (important!!)
Now the deletion is simply a matter of:
std::map<int, Cell*>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell *c = i->second;
if ( c != NULL ) delete c;
}
_map.clear();
I've built almost the exact same hybrid data structure you are after (list/map with the same algorithmic complexity if I were to use unordered_map instead) and have been using it from time to time for almost a decade though it's a kind of bulky structure (something I'd use with convenience in mind more than efficiency).
It's worth noting that this is quite different from just using std::unordered_map directly. For a start, it preserves the original order in which one inserts elements. Insertion, removal, and searches are guaranteed to happen in logarithmic time (or constant time depending on whether key searching is involved and whether you use a hash table or BST), iterators do not get invalidated on insertion/removal (the main requirement I needed which made me favor std::map over std::unordered_map), etc.
The way I did it was like this:
// I use this as the iterator for my container with
// the list being the main 'focal point' while I
// treat the map as a secondary structure to accelerate
// key searches.
typedef typename std::list<Value>::iterator iterator;
// Values are stored in the list.
std::list<Value> data;
// Keys and iterators into the list are stored in a map.
std::map<Key, iterator> accelerator;
If you do it like this, it becomes quite easy. push_back is a matter of pushing back to the list and adding the last iterator to the map, iterator removal is a matter of removing the key pointed to by the iterator from the map before removing the element from the list as the list iterator, finding a key is a matter of searching the map and returning the associated value in the map which happens to be the list iterator, key removal is just finding a key and then doing iterator removal, etc.
If you want to improve all methods to constant time, then you can use std::unordered_map instead of std::map as I did here (though that comes with some caveats).
Taking an approach like this should simplify things considerably over an intrusive list-based solution where you're manually having to free memory.
Is there a reason why you are not using built-in containers like, say, STL?
Anyhow, you don't show the code where the allocation takes place, nor the map definition (is this coming from a library?).
Are you sure you deallocate all of the previously allocated Cells, starting from the last one and going backwards up to the first?
You could do this using the STL (remove next from Cell):
std::unordered_map<int,std::list<Cell>>
Or if cell only contains a char
std::unordered_map<int,std::string>
If your compiler doesn't support std::unordered_map then try boost::unordered_map.
If you really want to use intrusive data structures, have a look at Boost Intrusive.
As others have pointed out, it may be hard to see what you're doing wrong without seeing your code.
Someone should mention, however, that you're not helping yourself by overlaying two container types here.
If you're using a hash_map, you already have constant insertion and deletion time, see the related Hash : How does it work internally? post. The only exception to the O(c) lookup time is if your implementation decides to resize the container, in which case you have added overhead regardless of your linked list addition. Having two addressing schemes is only going to make things slower (not to mention buggier).
Sorry if this doesn't point you to the memory leak, but I'm sure a lot of memory leaks / bugs come from not using stl / boost containers to their full potential. Look into that first.
You need to be very careful with what you are doing, because values in a C++ map need to be copyable and with your structure that has raw pointers, you must handle your copy semantics properly.
You would be far better off using std::list where you won't need to worry about your copy semantics.
If you can't change that then at least std::map<int, Cell*> will be a bit more manageable, although you would have to manage the pointers in your map because std::map will not manage them for you.
You could of course use std::map<int, shared_ptr<Cell> >, probably easiest for you for now.
If you also use shared_ptr within your Cell object itself, you will need to beware of circular references, and as Cell will know it's being shared_ptr'd you could derive it from enable_shared_from_this
My final point will be that list is very rarely the correct collection type to use. It is the correct one to use sometimes, especially when you have an LRU cache situation and you want to move accessed elements to the end of the list fast. However that is the minority case and it probably doesn't apply here. Think of an alternative collection you really want. map< int, set<char> > perhaps? or map< int, vector< char > > ?
Your list has a lot of overheads to store a few chars
I wonder is it possible to have a map that would work like boost circular buffer. Meaning it would have limited size and when it would get to its limited size it will start overwriting first inserted elements. Also I want to be capable to search thru such buffer and find or create with [name]. Is It possible to create such thing and how to do it?
What you want is an LRU (least recently used) Map, or LRA (least recently added) Map depending on your needs.
Implementations already exist.
Well, I don't think that structure is present out of the box in boost (may exist elsewhere, though), so you should create it. I wouldn't recommend using operator[](), though, at least as it is implemented in std::map, because this may make difficult to track elements added to the map (for exapmle, using operator[]() with a value adds that empty value to the map), and go for a more explicit get and put operations for adding and retrieving elements of the map.
As for the easiest implementation, I would go for using an actual map as the storage, and a deque for the storage of the elements added (not tested):
template <typename K, typename V>
struct BoundedSpaceMap
{
typedef std::map<K,V> map_t;
typedef std::deque<K> deque_t;
// ...
typedef value_type map_t::value_type;
// Reuse map's iterators
typedef iterator map_t::iterator;
// ...
iterator begin() { return map_.begin(); }
// put
void put ( K k, V v)
{ map_.insert(std::make_pair(k,v));
deque_.push_back(k);
_ensure(); // ensure the size of the map, and remove the last element
}
// ...
private:
map_t map_;
deque_t deque_;
void _ensure() {
if (deque_size() > LIMIT) {
map_.erase(deque_.front()); deque_.pop_front();
}
}
};
Well not really a "circular buffer" since that doesn't make much sense for a map, but we can use a simple array without any additional linked lists or anything.
This is called closed hashing - the wiki article summarizes it quite nicely. Double hashing is the most often used as it avoids clustering (which leads to worse performance), but has its own problems (locality).
Edit: Since you want a specific implementation, I don't think boost has one but this or this were mentioned in another SO post about closed hashing..