I would like to use a std::map (or prob. std::unordered_map) where i insert custom object keys and double values, e.g. std::map<CustomClass,double>.
The order of the objects does not matter, just the (fast) lookup is important. My idea is to insert the address/pointer of the object instead as that has already have a comparator defined, i.e. std::map<CustomClass*,double>
In
Pointers as keys in map C++ STL
it has been answered that this can be done but i am still a bit worried that there might be side effects that are hard to catch later.
Specifically:
Can the address of an object change during runtime of the program? And could this lead to undefined behavior for my lookup in the map?
A test program could be:
auto a = adlib::SymPrimitive();
auto b = adlib::SymPrimitive();
auto c = adlib::mul(a,b);
auto d = adlib::add(c,a);
// adlib::Assignment holds std::map which assigns values to a,b
auto assignment = adlib::Assignment({&a,&b},{4,2});
// a=4, b=2 -> c=8 -> d=12
adlib::assertEqual(d.eval_fcn(assignment), 12);
which is user code, so users could potentially put the variables into a vector etc.
Update:
The answers let me think about users potentially inserting SymPrimitives into a vector, a simple scenario would be:
std::vector<adlib::SymPrimitive> syms{a,b};
auto assignment = adlib::Assignment({&syms[0],&syms[1]},{4,2}); // not allowed
The pitfall here is that syms[0] is a copy of a and has a different address. To be aware of that i could probably make the responsibility of the user.
Can the address of an object change during runtime of the program?
No. The address of an object never changes.
However, an object can stop existing at the address where it was created when the lifetime of the object ends.
Example:
std::map<CustomClass*,double> map;
{
CustomClass o;
map.emplace(&o, 3.14);
}
// the pointer within the map is now dangling; the pointed object does not exist
Also note that some operations on come containers cause the elements of the container to occupy a new object, and the old ones are destroyed. After such operation, references (in general sense; this includes pointers and iterators) to those elements are invalid and the behaviour of attempting to access through those references is undefined.
Objects never change address during their lifetime. If all you want to do is look up some value associated with an object whose address is known at the time of the lookup, then using the address of the object as the key in a map should be perfectly safe.
(It is even safe if the object has been destroyed and/or deallocated, as long as you don't dereference the pointer and only use it as a key for looking up an item in the map. But you might want to figure out how to remove entries from the map when objects are destroyed or for other reasons shouldn't be in the map any more...)
Related
Consider the case where I have a user defined type with say a id() member function which returns a unique std::string.
I want a container of this objects, where the id() uniquely identifies the elements, but I want to "use" the objects to do other things which may modify their members.
I am currently constructing the objects.by calling std::set::emplace and capturing the returned iterator, bool pair.
But I am then not allowed to modify it's value as the iterator is const.
Is there a good way to do what I want? The only two I can think of are:
Store unique_ptrs to the object in the set, this way the pointer value is what differentiates it rather than the name and the object pointed to can be modified.
Store a map using the id() as the Key, but this means I have duplicated the keys.
I am happy to use well adopted and modern libraries, such as boost, if they have the right container for my problem.
Is there a good way to do what I want?
No not really. The granularity of std::set is at object level. There is no way to express that a portion of an object contributes to the key.
Some people recommend declaring all non-key members mutable. This is wrong, as mutable is meant for things that are hidden from the public interface of the object (e.g. a mutex).
The "official" way is to take the object out the set, modify it and put it back in. C++17 has set::extract which helps to improve performance of this task a bit (which of course remains inefficient if you never modify the key, since the tree still has to be checked/rebalanced).
I want to "use" the objects to do other things which may modify their members.
If you're absolutely sure you never modify the object key, just cast away constness. From a legal point of view it is OK to cast away constness from objects that were not born const. For extra safety you can wrap the key into another, const member:
struct Element {
const Key key;
Value value;
};
This won't help if you have a data cube with multiple sets each using its own "view" on the key.
1. Store unique_ptrs to the object in the set
This would be a pessimization due to extra indirection. Since the elements are on the heap, you will take an extra cache miss. And again end up with UB if you accidentally modify the key.
2. Store a map using the id() as the Key
Yes, different variations of this approach are possible, but you must still ensure to never modify the key.
For example you could store a key + pointer to data. This approach is often combined with a dense_hash_set with linear probing for best performance. Since the value is accessed only once after the element is found, it doesn't really matter that it is located elsewhere.
I would suggest using Boost.MultiIndex as a drop-in replacement for std::set, as it adds the modify method which allows modification of an element, checking whether the position within the container has changed:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
struct S { /* ... */ };
boost::multi_index_container<S> t; // default configuration emulates std::set<S>
auto [it, inserted] = t.emplace(...);
t.modify(it, [&](S& s) {
// modify s here
// if the key is unchanged, s does not move
// the iterator `it` remains valid regardless
});
Example.
There is a small overhead in checking that the key is indeed unchanged, but this should be minimal compared to the rest of the program and should optimize and predict well.
std::set maintains its elements sorted, and the keys the elements are sorted by, correspond to the elements themself. As a result, the elements in the std::set are const qualified to prevent the user from modifying the elements (i.e., the keys) and thus breaking the std::set order.
Traditionally, if you wanted to modify an element of an std::set, you would have first to remove the element object you wish to modify from the std::set, modify it, and insert it into the std::set again. The problem is that this results in the allocation of an std::set internal node.
Since C++17 you can remove and reinsert an element into the std::set without allocating an std::set internal node thanks to std::set::extract(). This member function returns the node handle corresponding to the requested element. After modifying the element through this returned node, you can reinsert the node with the corresponding insert() overload. No node allocation takes place as you are reusing an already allocated node.
The drawback to these approaches – regardless of whether or not allocation occurs – is that reinserting the element into the std::set takes logarithmic time in the size of the set (unless you can take advantage of the hint to insert()).
Casting away constness and modifying std::set elements
You can still cast const away from an element of the std::set and modify its data members, as long as your std::set's comparison function doesn't take into account the data members you change. That is, if you only modify data members of an element belonging to an std::set whose comparison function doesn't consider, the order won't break.
Coming from a Java background I am confused with how C++ allows passing objects by value. I have a conceptual doubt regarding when objects are passed by value:
void add_to_vector(vector<SomeClass>& v, SomeClass var) {
v.push_back(var);
}
Is this conceptually correct? Here is why I feel this is wrong: var is being passed by value and the memory for the object will be allocated on the stack for the function call. It is then getting added to the vector. At the end of the function call, the stack will be cleared and hence the object being referenced by var will also be cleared. So vector will now contain an object which no longer exists after the function call.
Am I missing something?
You are missing the powerful concept of value semantics. Just like var is a local copy in the function, std::vector is designed such that after v.push_back(var);, v holds a copy of var. This means that the elements of v can be used without having to worry where they came from (unless SomeClass has members with referential semantics, or in some way or another touches shared state.)
Yes, you're missing C++ value semantics. In Java, vectors only hold object references, object values themselves reside on the heap and are collected when no longer used. In C++, vectors hold object values, so practically always the vector will hold its own private value independent of function's local. Even if you passed var by reference, vector would hold its own private copy. Regard them as deep copies.
You might want to push_back(std::move(var)) here BTW, when var is passed by value in your example, if you don't plan to use the value after push_back.
I understand that if a std::vector is resized (I believe only increased in size) that the memory location of the vector is re-located in an effort to find a new location in heap memory that will actually fit the new size. In that event, if I have pointer A, B, and C that previously pointed to elements inside the vector, they will be pointing to the old, de-allocated memory locations and no longer be valid.
I wanted to know if A: If it was possible to be notified when such an event happens, with out me explicitly managing when a std::vector is resized, and B: how to deal with the pointers no longer referencing the correct location in memory.
Part B is kind of vague, so I will narrow the use case where I want to use this behavior. In my situation, I have a class that maintains a vector of objects that have several pointers inside themselves, like so:
class MultiPointer{
public:
Type1* t1;
Type2* t2;
Type3* t3;
MultiPointer(Type1*t1, Type2*t2, Type3*t3);
};
...
// attempt to create singleton pattern to make pool of each object type for each class
class Type1{
...
static vector<Type1> arr1 = ...
}
class Type2{
...
static vector<Type2> arr2 = ...
}
class Type3{
...
static vector<Type3> arr3 = ...
}
...
//note, would actually use CRTP to make sure I don't have to type out static vector<TypeN> each time
MultiPointer a(&arr1[0], arr2[0], arr3[0]);
MultiPointer b(&arr1[1], arr2[1], arr3[1]);
MultiPointer c(&arr1[2], arr2[2], arr3[2]);
std::vector<MultiPointer> mpv= {a,b,c};
... insertions into arr1,2,3 ...
//How do I make sure that a, b, and c still point to the same types?
Note that in my situation, I have access to arr1->3 at all times that are relevant, as its a static variable and I know the types I'm working with.
My idea was to copy the value of the pointer for arr1 into a seperate variable, ie
Type1* arr1ptrcpy = &(Type1::arr1[0]);
Then I would check if the size changed when needed, and if the size changed I would check if the pointers were the same (or just check if the pointers are the same)
//ie in main
if(arr1ptrcpy != &(Type1::arr1[0])){
// for each multi pointer in mpv, adjust pointers for arr1.
...
}
If I noticed the addresses changed, I would do pointer arhimetic to find the correct placement of the old pointers relative to the new address.
// in MultiPointer
...
// if pointers don't match for arr1
t1 = (arr1ptrcpy - t1) + Type1::arr1[0];
...
// if pointers don't match for arr2
t2 = (arr2ptrcpy - t2) + Type2::arr2[0];
...
// if pointers don't match for arr3
t2 = (arr3ptrcpy - t3) + Type3::arr3[0];
While I could do this all with handles, that requires runtime overhead for pointer de-referencing where I'll have a lot of pointers. In this situation, I'll be checking for vector changes much less often than de-referencing pointers. If this could be done with iterators with out additional memory overhead per iterator I would be willing to see how that could be done as well.
EDIT: I want to give more information on what my actual application is trying to accomplish/ how it is structured. In the actual usecase, there are many types of MultiPointer, some which reference the same object, but have otherwise different pointer types (hence the need to have pointers in the first place) and the objects they reference are used elsewhere outside of the context of a multipointer as well.
In my actual application, 'MultiPointer' acts as a grouping of the objects with different functionality attached to these groupings. For example, you might have, say, a PhysicsMultiPointer, which has a Position* and Velocity*, and a GraphicMultiPointer which has a DisplayImage* and a Position*. They both will point to the same Position object here.
SECOND EDIT:
I should mention I need all elements in the type vectors to be in contiguous memory, if that wasn't the case I wouldn't even bother with this whole ordeal since I would just have pointers to heap objects who wouldn't change location.
A: If it was possible to be notified when such an event happens, with out me explicitly managing when a std::vector is resized,
No.
You can easily tell when it is going to happen, but you cannot tell after the fact when it has happened. vector reallocation only happens when you do something that increases the size of the vector past its current capacity (or you call shrink_to_fit). So if you insert 4 items into the vector, it will not reallocate itself if the capacity - size is 4 or more.
Now, you can use this fact to build wrapper functions/objects for the insertion functions, which will check the capacity before and after the insertion and return whether it has changed (if reallocation occurs, the capacity has to be increased).
However, a far better way to deal with this is to simply... not let it happen. Use the reserve function to make the capacity sufficiently large that you simply will have no need for reallocation. By doing that, you won't have to care about resetting pointers and the like.
Note that the above only covers pointer invalidation due to reallocation. Pointers and references are invalidated if you insert objects into the middle of the vector, but only those pointers/references to the element at the insertion point and all subsequent elements.
B: how to deal with the pointers no longer referencing the correct location in memory.
Don't use pointers. Instead, use indices. Unless you're inserting into the middle of the vector; indices can't help there.
If indices can't work for you, then I would strong reconsider using vector at all. If you need stability of elements that badly, then list or forward_list seems like a far more appropriate tool for the job. Or, if you have access to Boost, boost::stable_vector.
To be notified when the internal pointer in the vector is changed, you can wrap std::vector. You'll need to hook the functions push_back(), emplace_back(), resize(), shrink_to_fit() (just disable that one), reserve(), and insert(). It's easy enough to just check data() before and after, and invoke a callback if it changed.
I looked for the same type of question but I didn't find the answer to my question (existential one):
What type of hook should I choose to keep control over an object in a list?
I waver between pointer and iterator.
The container is filled at the beginning and shouldn't be resized after that. The hook is the way I use to switch between my objects at the whim of user and manipulating only one variable in my algorithm.
In all cases, I must go through an iterator to find the right object to hook. But which one is the best practice/use?
// 10 object list
std::list <Object> List(10);
std::list <Object>::iterator it = List.begin();
Object *pt = NULL;
// Select the 3rd object
advance(it, 3);
pt = &(*it);
// Access to object member...
it->member;
pt->member;
Pointers allow not to access to neighbours, contrary to iterators, but may be unsafe.
What's the best pratice?
It depends on what you want to do with the "hook". If you use
an iterator, it can be used as the starting point for moving
forward or backward in the list. If you use a pointer, you can
also point to objects outside of the list. In the end, it
depends on how you expect your code to evolve.
Storing pointers or iterators into a container is quite risky because you might find they're invalid by the time you use them (i.e. if the container or the data changes).
A more generalised and robust approach might be to use a map instead of a list. Every value is identified by a key (of whatever type you like), and you can easily store the keys and check whether or not they're valid before you use them, e.g.:
std::map<int, std::string> data;
// Add stuff to the map
data[5] = "blah";
data[27] = "foo";
// Check if a key exists
if (data.find(31) == data.end()) {
// Key 31 does NOT exist
} else {
// Key 31 DOES exist
}
One thing to be aware of though is that maps are ordered by key value. That means if the sequence of elements is important then you'll need to choose your keys carefully.
In most cases use references:
Object& ref = *it;
ref.member
It behaves like a pointer (so feel free to pass it around functions) but you can't do pointer arithmetics on it (ref++ will actually call the operator++() on Object). Also you can't initialize it from null (will be reported as an error when you try to create the reference).
One thing to remember you still need the object to be alocated somewhere. If say some function deletes the Object from List you shouldn't use ref anymore.
So I have some C++ classes that use a map and key class for a sort of data structure. In my insert method I use the typical map.insert. I want this function to return a pointer so I can modify some values (not the one used for comparison) inside the element inserted. So I was wondering if this is safe to this..
template<typename T>
NodeT<T> * TreeT<T>::
MakeNode(PointT point)
{
NodeT<T> * prNode = new NodeT<T>;
//set the contents for the node
prNode->SetNode(point, m_dTolerance);
//Create the key class using the
VectorKey key(point, m_dTolerance);
//Store the key,node as a pair for easy access
return_val = m_tree.insert( pair<VectorKey, NodeT<T> >(key, *prNode) );
if (return_val.second == false)
//if return_val.second is false it wasnt inserted
prNode = NULL;
else
//it was inserted, get a pointer to node
prNode = &(return_val.first->second); //is this safe if I plan to use it later?
return prNode;
}
I seemed to learn the hard way that my original pointer (the one I created with new), was pointing to the wrong element after the insert. Can anyone tell me why that is? So I used the return_val iterator to get the right pointer. I kinda dont want to return an iterator but if its safer then I do...
Thanks!
You seems to have troubles in your code with pointers and values. First you allocate an object on a heap ( with new Node )
Then you use a copy of that object to sore within your map.
PS. And then you loose original object forever as do not free memory which leads to memory leak.
In your case - it is invalid because you return pointer to object which can be deleted at any time ( for example next time you add something to your map and map decides to reallocate it's tree, so it will copy objects to different places ).
Storing pointers as map values prevents this. The only thing you need to remember to clear them up when removing object from map and when removing map itself.
The easy way to handle that would be using smart pointers (boost::shared_ptr for example ) or smart map class (boost::ptr_map for example ).
To fix that - use pointers everywhere ( store a pointer as a map value ).
This way - you will be able to return pointer from this function and it will be valid.
So just turn your map to map*> and this should fix most of your problems.
Do not forger to delete objects when erasing them from the map.
This code sample is interesting because contains a several things wrongs or to avoid.
Implementation
The most important things are been said (mainly by Bogolt):
You are leaking memory, because allocate NodeT<T> from the heap and never free it again, since map will allocate a copy of the object, not the pointer. Indeed, you specify as parameter *prNode, not prNode.
You use the heap to allocate the object (will be copied into the map), but you assume you always allocate the object. Despite it will be the very most probably case, that is not alway true: new operator would be return null or throw a bad_alloc exception. The code does not handle it.
Anyway, you use the heap when is not really needed. (And you see the problems are you intriducing because that). You can just create the object in the stack and then insert into the map, avoiding the previous problems and typing less code.
Design
The function returns a pointer to the element in the map. Depending the program, is possible this is safe. But what happens if the code reference the pointer when the object is removed from the map? Better, if you are returning pointer, do not return a raw pointer. Use smart pointer (shared_ptr in this case) instead. Using shared_ptr you will have not problems with the object life.
Other reason to use smart pointers: because the insertion into the map imply a copy of the element, you are imposing a requirement to NodeT<T>: it has to be copy constructible. May be this requirement is not important for performance, but may be in other circumstances copying the object have drawbacks. If you use smart pointer (or boost::ptr_map), the object will be created just once and is not copied.
Style
Just some suggestion, but not too important:
instead type pair<VectorKey, NodeT<T> >(key, *prNode), type make_pair(key, *prNode). The code is more compact and clearer typing less.
Well I'd say that depends on is your map alive longer than anything that could use (and store) the pointer.
If yes, (ie, it's in some sort of singleton), you could go with it, but not very safe anyway since any code could delete the pointer.
The best option is to store boost::shared_ptr (or std:: since c++11) instead of raw pointers in your mapn and return only a weak_ptr after insertion.
That way, you're sure no other code can delete your pointer as long as the map holds it, and that no one can use a pointer that has been erased from the map.