Theoretical clarification regarding maps and iterators - c++

If I have a class with a map as a private member such as
class MyClass
{
public:
MyClass();
std::map<std::string, std::string> getPlatforms() const;
private:
std::map<std::string, std::string> platforms_;
};
MyClass::MyClass()
:
{
platforms_["key1"] = "value1";
// ...
platforms_["keyN"] = "valueN";
}
std::map<std::string, std::string> getPlatforms() const
{
return platforms_;
}
And in my main function would there be a difference between these two pieces of code?
Code1:
MyClass myclass();
std::map<std::string, std::string>::iterator definition;
for (definition = myclass.getPlatforms().begin();
definition != myclass.getPlatforms().end();
++definition){
std::cout << (*definition).first << std::endl;
}
Code2:
MyClass myclass();
std::map<std::string, std::string> platforms = myclass.getPlatforms();
std::map<std::string, std::string>::iterator definition;
for (definition = platforms.begin();
definition != platforms.end();
++definition){
std::cout << (*definition).first << std::endl;
}
In Code2 I just created a new map variable to hold the map returned from the getPlatforms() function.
Anyway, in my real code (which I cannot post the real code from but it is directly corresponding to this concept) the first way (Code1) results in a runtime error with being unable to access memory at a location.
The second way works!
Can you enlighten me as to the theoretical underpinnings of what is going on between those two different pieces of code?

getPlatforms() returns the map by value, rather than reference, which is generally a bad idea.
You have shown one example of why it is a bad idea:
getPlatforms().begin() is an iterator on a map that is gone before the iterator is used and getPlatforms().end() is an iterator on a different copy from the same original map.

Can you enlighten me as to the theoretical underpinnings of what is going on between those two different pieces of code?
When you return by value, you return a deep copy of the data.
When you call myclass.getPlatforms().begin(); and myclass.getPlatforms().end(); you are effectively constructing two copies of your data, then getting the begin iterator from one copy and the end iterator from the other. Then, you compare the two iterators for equality; This is undefined behavior.
results in a runtime error with being unable to access memory at a location.
This is because definition is initialized, then the temporary object used to create it is deleted, invalidating the data the iterator pointed to. Then, you attempt to use the data, through the iterator.

A problem that you have is that you should be using const_iterator not iterator. This is because the function getPlatforms is const qualified, whereas the function in the map iterator begin() is not; you must use the const qualified const_iterator begin() const instead to explicitly tell the compiler you will not modify any members of the class.
Note: this is only the case for code 1, which should, by the way return const&

Related

Get a vector of map keys without copying?

I have a map of objects where keys are std::string. How can I generate a vector of keys without copying the data?
Do I need to change my map to use std::shared_ptr<std::string> as keys instead? Or would you recommend something else?
My current code goes like this:
MyClass.h
class MyClass {
private:
std::map <std::string, MyType> my_map;
public:
const std::vector<std::string> MyClass::getKeys() const;
}
MyClass.cpp
const std::vector<std::string> MyClass::getKeys() const
{
std::vector<std::string> keys = std::vector<std::string>();
for (const auto& entry : my_map)
keys.push_back(entry.first); // Data is copied here. How can I avoid it?
return keys;
}
As suggested by Kevin, std::views::keys was pretty much made for this. The view it produces is a lightweight object, not much more than a pointer to the range argument, which solves the ownership and lifetime issues. Iterating over this view is identical to iterating over the map, the only difference is that the view's iterator dereferences to the first element of the underlying map value.
As for some code, it is pretty simple:
// ...
#include <ranges>
class MyClass {
private:
std::map<std::string, MyType> my_map;
public:
std::ranges::view auto getKeys() const;
};
std::ranges::view auto MyClass::getKeys() const
{
return std::views::keys(my_map);
}
Since ranges and concepts go hand in hand, I've used the std::ranges::view to constrain the auto return type. This is a useful way to let the compiler and user of your function know that it will return a particular category of object, without having to specify a potentially complicated type.

Validity of pointer returned by operator->

I'm implementing a two-dimensional array container (like boost::multi_array<T,2>, mostly for practice). In order to use double-index notation (a[i][j]), I introduced a proxy class row_view (and const_row_view but I'm not concerned about constness here) which keeps a pointer to the beginning and end of the row.
I would also like to be able to iterate over rows and over elements within a row separately:
matrix<double> m;
// fill m
for (row_view row : m) {
for (double& elem : row) {
// do something with elem
}
}
Now, the matrix<T>::iterator class (which is meant to iterate over rows) keeps a private row_view rv; internally to keep track of the row the iterator is pointing to. Naturally, iterator also implements dereferenciation functions:
for operator*(), one would usually want to return a reference. Instead, here the right thing to do seems to return a row_view by value (i.e. return a copy of the private row_view). This ensures that when the iterator is advanced, the row_view still points to the previous row. (In a way, row_view acts like a reference would).
for operator->(), I'm not so sure. I see two options:
Return a pointer to the private row_view of the iterator:
row_view* operator->() const { return &rv; }
Return a pointer to a new row_view (a copy of the private one). Because of storage lifetime, that would have to be allocated on the heap. In order to ensure clean-up, I'd wrap it in a unique_ptr:
std::unique_ptr<row_view> operator->() const {
return std::unique_ptr<row_view>(new row_view(rv));
}
Obviously, 2 is more correct. If the iterator is advanced after operator-> is called, the row_view that is pointed to in 1 will change. However, the only way I can think of where this would matter, is if the operator-> was called by its full name and the returned pointer was bound:
matrix<double>::iterator it = m.begin();
row_view* row_ptr = it.operator->();
// row_ptr points to view to first row
++it;
// in version 1: row_ptr points to second row (unintended)
// in version 2: row_ptr still points to first row (intended)
However, this is not how you'd typically use operator->. In such a use case, you'd probably call operator* and keep a reference to the first row. Usually, one would immediately use the pointer to call a member function of row_view or access a member, e.g. it->sum().
My question now is this: Given that the -> syntax suggests immediate use, is the validity of the pointer returned by operator-> considered to be limited to that situation, or would a safe implementation account for the above "abuse"?
Obviously, solution 2 is way more expensive, as it requires heap-allocation. This is of course very much undesirable, as dereferenciation is quite a common task and there is no real need for it: using operator* instead avoids these problems as it returns a stack-allocated copy of the row_view.
As you know, operator-> is applied recursively on the functions return type until a raw pointer is encountered. The only exception is when it's called by name like in your code sample.
You can use that to your advantage and return a custom proxy object. To avoid the scenario in your last code snippet, this object needs to satisfy several requirements:
Its type name should be private to the matrix<>::iterator, so outside code could not refer to it.
Its construction/copy/assignment should be private. matrix<>::iterator will have access to those by virtue of being a friend.
An implementation will look somewhat like this:
template <...>
class matrix<...>::iterator {
private:
class row_proxy {
row_view *rv_;
friend class iterator;
row_proxy(row_view *rv) : rv_(rv) {}
row_proxy(row_proxy const&) = default;
row_proxy& operator=(row_proxy const&) = default;
public:
row_view* operator->() { return rv_; }
};
public:
row_proxy operator->() {
row_proxy ret(/*some row view*/);
return ret;
}
};
The implementation of operator-> returns a named object to avoid any loopholes due to guaranteed copy elision in C++17. Code that use the operator inline (it->mem) will work as before. However, any attempt to call operator->() by name without discarding the return value, will not compile.
Live Example
struct data {
int a;
int b;
} stat;
class iterator {
private:
class proxy {
data *d_;
friend class iterator;
proxy(data *d) : d_(d) {}
proxy(proxy const&) = default;
proxy& operator=(proxy const&) = default;
public:
data* operator->() { return d_; }
};
public:
proxy operator->() {
proxy ret(&stat);
return ret;
}
};
int main()
{
iterator i;
i->a = 3;
// All the following will not compile
// iterator::proxy p = i.operator->();
// auto p = i.operator->();
// auto p{i.operator->()};
}
Upon further review of my suggested solution, I realized that it's not quite as fool-proof as I thought. One cannot create an object of the proxy class outside the scope of iterator, but one can still bind a reference to it:
auto &&r = i.operator->();
auto *d = r.operator->();
Thus allowing to apply operator->() again.
The immediate solution is to qualify the operator of the proxy object, and make it applicable only to rvalues. Like so for my live example:
data* operator->() && { return d_; }
This will cause the two lines above to emit an error again, while the proper use of the iterator still works. Unfortunately, this still doesn't protect the API from abuse, due to the availability of casting, mainly:
auto &&r = i.operator->();
auto *d = std::move(r).operator->();
Which is a death blow to the whole endeavor. There is no preventing this.
So in conclusion, there is no protection from a direction call to operator-> on the iterator object. At the most, we can only make the API really hard to use incorrectly, while the correct usage remains easy.
If creation of row_view copies is expansive, this may be good enough. But that is for you to consider.
Another point for consideration, which I haven't touched on in this answer, is that the proxy could be used to implement copy on write. But that class could be just as vulnerable as the proxy in my answer, unless great care is taken and fairly conservative design is used.

std::map of objects or object pointers?

I have two options to create a std map. I can work with both the types of map.
1. std::map<A, std::string>
2. std::map<A*, std::string>
where A is a class object
Later in the code I will have to perform a find operation.
1. std::map<A, std::string> myMap1;
if(myMap1.find(A_obj) != myMap1.end())
{
}
2. std::map<A*, std::string> myMap2;
if(myMap2.find(A_obj_ptr) != myMap2.end())
{
}
I want to know which one is recommend to create.
In which of these two, would I not have to overload any operators in class A for find operation to work. Which of these would have problems on insert operation when any operators are not overloaded.
If it helps, this is class A
class A
{
private:
std::vector<std::string> m_member;
public:
A(std::vector<std::string> input);
};
Note that these two samples are only functionally equivalent if A instances are singletons. Otherwise it's very possible that two A values which are equal in value but different in address. This would lead to different semantics.
Personally I prefer the std::map<A, std::string> version because the semantics of it are crystal clear. The keys have equality semantics and there is no potentially for a dangling or nullptr value. The std::map<A*, std::string> version comes with a host of questions for the developer looking through the code
Who owns the key values?
Are all instances of A singletons? If not how do I ensure the A I'm looking for is the A* value that is stored?
When are the keys freed?
First option is preferable. For second option, we need to make sure that keys (pointers here) are protected. May be shared pointers will help. Other issue is that the map will be shorted w.r.t. the address of the A objects and that might not be very useful. Below sample demonstrates how the comparator can be defined or the default comparator can be overridden:
class A
{
public:
int a;
};
namespace std
{
template<>
struct less<A*>
{
bool operator()(const A* const a, const A* const b) const{
return a->a < b->a;
}
};

C++ destructor runtime error: failed to munmap

I've defined a class called ClusterSet that just has one field, called clusters:
class ClusterSet {
std::map<std::string, std::map<std::string, float>* > clusters;
public:
typedef std::map<std::string, std::map<std::string, float> *>::iterator iterator;
typedef std::map<std::string, std::map<std::string, float> *>::const_iterator const_iterator;
iterator begin() { return clusters.begin(); }
const_iterator begin() const { return clusters.begin(); }
iterator end() { return clusters.end(); }
const_iterator end() const { return clusters.end(); }
void create_cluster(std::string representative);
void add_member(std::string representative, std::string member, float similarity);
int write_to_file(std::string outputfile);
int size();
~ClusterSet();
};
In my create_cluster method, I use new to allocate memory for the inner map, and store this pointer in clusters. I defined a destructor so that I can deallocate all this memory:
ClusterSet::~ClusterSet() {
ClusterSet::iterator clust_it;
for (clust_it = clusters.begin(); clust_it != clusters.end(); ++clust_it) {
std::cout << "Deleting members for " << clust_it->first << std::endl;
delete clust_it->second;
}
}
When my destructor is called, it seems to deallocate all the inner maps correctly (it prints out "Deleting members for..." for each one). However, once that's done I get a runtime error that says "failed to "munmap" 1068 bytes: Invalid argument". What's causing this?
I have briefly looked at the "rule of three" but I don't understand why I would need a copy constructor or an assignment operator, or how that might solve my problem. I would never need to use either directly.
There's no good reason (and plenty of disadvantages) for dynamically allocating the inner maps. Change the outer map type to
std::map<std::string, std::map<std::string, float> >
and then you won't need to implement your own destructor and copy/move semantics at all (unless you want to change those you get from the map, perhaps to prevent copying your class).
If, in other circumstances, you really do need to store pointers to objects and tie their lifetime to their presence in the map, store smart pointers:
std::map<std::string, std::unique_ptr<something> >
If you really want to manage their lifetimes by hand for some reason, then you will need to follow the Rule of Three and give your class valid copy semantics (either preventing copying by deleting the copy constructor/assignment operator, or implementing whatever semantics you want). Even if you don't think you're copying objects, it's very easy to write code that does.

iterating over the return value of a method in C++

Suppose I have a class Foo like so:
class Foo {
public:
std::vector<Bar> barVec() const {return barVec_;}
private:
std::vector<Bar> barVec_;
};
where Bar is some other class. So outside of Foo the only access to barVec_ is via the method barVec().
If myFoo is an instance of Foo, and pred is a unary predicate on Bar, is it okay to do something like this:
auto i = find_if(myFoo.barVec().begin(), myFoo.barVec().end(), pred);
if (i != myFoo.barVec().end()) {
//do some stuff here
}
Or do I have to assign myFoo.barVec() to a variable and iterate over that variable? For example:
std::vector<Bar> tmp = myFoo.barVec();
auto i = find_if(tmp.begin(), tmp.end, pred);
if (i != tmp.end()) {
//do some stuff here
}
No, it’s not OK since you make a copy of the vector when returning it so the iterators that you’re comparing refer to different containers, which is undefined behaviour.
You could just return a (const-) reference but it’s much cleaner to overload begin and end for your class proper rather than exposing the vector.
Foo::barVec() returns a copy of a vector each time you call it. So the iterators returned by two calls to Foo::barVec().begin() and Foo::barVec().end() belong to different objects.
You probably meant to return a reference:
const std::vector<Bar>& barVec() const {return barVec_;}
but you should consider returning providing methods that return the begin() and end() iterators from your class directly, instead of exposing the underlying vector data member.
I assume myBar should be myFoo and be of type Foo.
You have to assign it to a std::vector<Bar> object first. The problem is that myFoo.barVec() returns a copy of the vector stored inside myFoo. Each of the calls to barVec in your find_if line will return a different temporary std::vector<Bar> object. The begin and end iterators for those two temporary objects do not apply to the same sequence. This means you have undefined behaviour.