C++ controlling memory access and pointer dereferencing - c++

I am writing an API in C++ where I want to restrict what a programmer can do with pointers to objects the API creates.
For example,
// API
class object {
// details unimportant
};
// Programmer's code
object o; // OK
object *op = &o; // OK
long *lp = (long *)op; // No
object o2 = op[100]; // No
I understand that some of this is probably difficult or impossible given C++'s type system. Are there ways to enforce this kind of type usage pattern? Are there ways to restrict the available operations on a given pointer type? Would something like overloading object's operator& work?
class object {
object_pointer operator&();
};

There is no way in C++ to prevent C-style casting of pointers. Any (non-function) pointer may be C-style cast to any other pointer type.
And given a pointer, there is no way to tell the compiler "This is definitely not an array." So you can't prevent op[100] either.
What you could do is hide all the members of the class itself, e.g. using the Pimpl idiom or C-style opaque structures/classes. But this won't help you in terms of restricting users' abilities to obtain pointers to your instances and cast them.
C++ is a language for consenting adults. If you don't consent, use another language.

Related

Reason for using smart pointers with a container

Simply written I would like to ask "what is a good reason to use smart pointers?"
for ex std::unique_ptr
However, I am not asking for reasons to use smart pointers over regular (dumb) pointers. I think every body knows that or a quick search can find the reason.
What I am asking is a comparison of these two cases:
Given a class (or a struct) named MyObject use
std:queue<std::unique_ptr<MyObject>>queue;
rather than
std:queue<MyObject> queue;
(it can be any container, not necessarily a queue)
Why should someone use option 1 rather than 2?
That is actually a good question.
There are a few reasons I can think of:
Polymorphism works only with references and pointers, not with value types. So if you want to hold derived objects in a container you can't have std::queue<MyObject>. One options is unique_ptr, another is reference_wrapper
the contained objects are referenced (*) from outside of the container. Depending on the container, the elements it holds can move, invalidating previous references to it. For instance std::vector::insert or the move of the container itself. In this case std::unique_ptr<MyObject> assures that the reference is valid, regardless of what the container does with it (ofc, as long as the unique_ptr is alive).
In the following example in Objects you can add a bunch of objects in a queue. However two of those objects can be special and you can access those two at any time.
struct MyObject { MyObject(int); };
struct Objects
{
std::queue<std::unique_ptr<MyObject>> all_objects_;
MyObject* special_object_ = nullptr;
MyObject* secondary_special_object_ = nullptr;
void AddObject(int i)
{
all_objects_.emplace(std::make_unique<MyObject>(i));
}
void AddSpecialObject(int i)
{
auto& emplaced = all_objects_.emplace(std::make_unique<MyObject>(i));
special_object_ = emplaced.get();
}
void AddSecondarySpecialObject(int i)
{
auto& emplaced = all_objects_.emplace(std::make_unique<MyObject>(i));
secondary_special_object_ = emplaced.get();
}
};
(*) I use "reference" here with its english meaning, not the C++ type. Any way to refer to an object (e.g. via a raw pointer)
Usecase: You want to store something in a std::vector with constant indices, while at the same time being able to remove objects from that vector.
If you use pointers, you can delete a pointed to object and set vector[i] = nullptr, (and also check for it later) which is something you cannot do when storing objects themselves. If you'd store Objects you would have to keep the instance in the vector and use a flag bool valid or something, because if you'd delete an object from a vector all indices after that object's index change by -1.
Note: As mentioned in a comment to this answer, the same can be archieved using std::optional, if you have access to C++17 or later.
The first declaration generates a container with pointer elements and the second one generates pure objects.
Here are some benefits of using pointers over objects:
They allow you to create dynamically sized data structures.
They allow you to manipulate memory directly (such as when packing or
unpacking data from hardware devices.)
They allow object references(function or data objects)
They allow you to manipulate an object(through an API) without needing to know the details of the object(other than the API.)
(raw) pointers are usually well matched to CPU registers, which makes dereferencing a value via a pointer efficient. (C++ “smart” pointers are more complicated data objects.)
Also, polymorphism is considered as one of the important features of Object-Oriented Programming.
In C++ polymorphism is mainly divided into two types:
Compile-time Polymorphism
This type of polymorphism is achieved by function overloading or operator overloading.
Runtime Polymorphism
This type of polymorphism is achieved by Function Overriding which if we want to use the base class to use these functions, it is necessary to use pointers instead of objects.

Getter for large member variables w/o copying

I have a class containing large member variables. In my case, the large member variable is a container of many objects and it must be private as I don't want to allow a user to modify it directly
class Example {
public:
std::vector<BigObject> get_very_big_object() const { return very_big_object; }
private:
std::vector<BigObject> very_big_object;
}
I want a user to be able to view the object without making a copy:
Example e();
auto very_big_object = e.get_very_big_object(); // Uh oh, made a copy
cout << very_big_object[11]; // Look at any element in the vector etc
I'm a bit confused about the best way to do it. I thought about returning a constant reference, i.e., make my getter:
const std::vector<BigObject>& get_very_big_object() const { return very_big_object; }
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics. But I found that a bit cryptic.
What's the modern best practice for doing this?
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics.
On this point, the article is flat-out wrong. A smart pointer does not remove the "risk".
Quick summary of relevant parts of the article
If a class returns a const reference to a data member, client code may introduce a const_cast and thereby change the data member without going through the class' API.
The article proposes (incorrectly) that the above can be avoided by using a smart pointer. The setup is for the class to maintain a shared pointer to the data, and have the getter return that pointer cast to a shared pointer to const data.
Critique of the points
First of all, this does not work. All one has to do is de-reference the smart pointer to get a const reference to the data, which can then be const_cast as before. Using the author's own example, instead of
std::string &evil = const_cast<std::string&>(obj.someStr());
use
std::string &evil = const_cast<std::string&>(*obj.str_ptr());
to get the same data-changing results when returning a smart pointer. The entire article is not wrong, but it does get several points wrong. This is one of them.
Second of all, this is not your concern. When you return a const reference, you are telling client code that this value is not to be changed. If the client code does so anyway, it's the client code that broke the agreement. Essentially, the client code invoked undefined behavior, so your class is free to do anything, even crash the program.
What's the modern best practice for doing this?
Simply return a const reference. (Most rules have exceptions, but in my experience, this one seems to be on target 95-99.9% of the time.)
What I did when I was working on my BDD-library for school is to create a wrapper class called VeryBigObject, which contacts the singleton upon instantiation and hides a reference-counting pointer, from there you can override the operator->() method to allow for direct access to the class's methods.
So something like this
class VeryBigObject {
private:
vector<BigObject>* obj;
public:
VeryBigObject() {
// find a way to instantiate with a pointer, not by copying
}
VeryBigObject(const VeryBigObject& o) {
// Update reference counts
obj = o.obj;
}
virtual VeryBigObject operator->(const VeryBigObject&); // I don't remember how to do this one, just google it.
... // Do other overloads as you see fit to mask working with the pointer directly.
};
This allows you to create a small portable class that you don't have to worry about copying, but also has access to the larger object easily. You'll still need to worry about things like caching and such though

how to best assign the pointer onto a _vector_ object to a "void *" variable in c++?

I have the scenario where I have two "worlds" of C++ codes separated by a calling barrier that is only C for design reasons. (in more detail: I have a main thread and multiple child threads where each of the childs can service me calling a bunch of functions with a passed set of arguments and returning the functions result. the interconnect is pure C but the architecture is shared memory and the data to pass are for some of the calls C++ vector objects.)
Doing it the simple way on a vector failed for me - this statement only gets the pointer on the data of the object but not the object pointer itself:
vector<something> my_object;
void * argv0 = &my_object;
If I learnt the right way the class is designed for providing me a pointer onto it's data array rather than a pointer on the object (which further has special members for management like size or allocated space). as the target layer is not capable to manage and update the special members it will happen that any need for alterations to that area can not be done. In other words the "operator=" has a class-defined pairing of "(void *) = (vector)" and I don't see how to overcome that in a direct C++ fashion.
My next best guess was this C fashion approach:
typedef union
{
void * pvObject;
vector<something> * pcObject;
} VECTOR_VOID_UNION_T;
vector<something> my_object;
VECTOR_VOID_UNION_T uVV;
uVV.pcObject = &my_object;
void * argv0 = uVV.pvObject;
I am really not sure if this is the best or only way to do it in a case with such sort of class design. There might be other operators like the C++ extended casting operators that might solve the access problem to the object pointer itself much more gently. but as of now any attempt I tried out did not give me success.
My question is now:
How to correctly and more elegantly overcome that class-defined =operator (or one of it's equivalents) in a C++ fashion so that finally the pointer to the vector object [edit: not the vector data] is stored in the variable of type "void*"?
You are probably looking for this:
void *argv0 = reinterpret_cast<void *>(&my_object);
If you're trying to get a void* that points to the vector's data:
assert(!theVector.empty());
void* thePtr = static_cast<void*>(theVector.data());
It's kind of hard to tell what you're asking, though.

How do I decide if I should use a pointer or a regular variable?

In c++ I can declare a field as a regular variable of some type, instantiate it in the constructor, and use it later:
private: Foo field;
...
A::A() {
// upd: likely i instatiate field wrong ways (see comments)
field = FieldImpl();
}
....
method(field);
Or alternatively I can use a pointer:
private: Foo* field;
...
A::A() {
field = new FieldImpl();
}
A::~A() {
delete field;
}
...
method(*field);
When declaring a field, how do I decide if I should use a pointer or a regular variable?
You might want to use a pointer if:
The referenced object can outlive the parent.
Because of size, you want to ensure the referenced object is on the heap.
The pointer is provided from outside the class.
Null is a possible value.
The field can be set dynamically to a different object.
The actual object type is determined at runtime. For example, the field might be a base-class pointer to any of a number of subclasses.
You might also want to use a smart pointer.
The last point above applies to your sample code. If your field is of type Foo, and you assign a FieldImpl to it, all that remains is the Foo part of the FieldImpl. This is referred to as the slicing problem.
Regular variable if
Foo is an integral part of the class, i.e., every instance always has its own Foo and
Foo is not too large (it can go on the stack).
Pointer if
several instances may share a single Foo,
there may be instances that don't have a Foo at some point, or
Foo is really large and should always be on the heap.
C++ is designed with automatic variables in mind. Critical C++ idioms like RAII depend on automatic variables. Because of C++'s design decisions, using automatic variables is simpler and easier than using pointers. You shouldn't add the complexity of using pointers unless you actually need the capabilities they provide. (And if you have to then use a smart pointer.)
Complexity needs to be justified, and in this example you haven't shown any reason for the extra complexity in your pointer example, so you should use an automatic variable.
well after 10 years of C# and Java pointers are simple and regular variables is complexity for me :) so there should be more serios reasons not to use pointers. for example I guess pointers are not processor-cache friedly
C# and Java are designed differently than C++. Their syntax and runtime are designed to make pointers the simpler (or only) method, and they take care of some of the problems that creates for you behind the scenes. Trying to work around the language to avoid pointers in Java and C# would add complexity.
Furthermore C# and Java rely to a much greater degree on polymophic types, and they don't have the "don't pay for what you don't use" policy C++ has. Pointers are needed for polymorphic types, but C# and Java are happy to make you pay the cost even when you don't need to whereas C++ doesn't do that.

Which kind of (auto) pointer to use?

I came accross several questions where answers state that using T* is never the best idea.
While I already make much use of RIIC, there is one particular point in my code, where I use T*. Reading about several auto-pointers, I couldn't find one where I'd say that I have a clear advantage from using it.
My scenario:
class MyClass
{
...
// This map is huge and only used by MyClass and
// and several objects that are only used by MyClass as well.
HashMap<string, Id> _hugeIdMap;
...
void doSomething()
{
MyMapper mapper;
// Here is what I pass. The reason I can't pass a const-ref is
// that the mapper may possibly assign new IDs for keys not yet in the map.
mapper.setIdMap(&_hugeIdMap);
mapper.map(...);
}
}
MyMapper now has a HashMap<...>* member, which - according to highly voted answers in questions on unrelated problems - never is a good idea (Altough the mapper will go out of scope before the instance of MyClass does and hence I do not consider it too much of a problem. There's no new in the mapper and no delete will be needed).
So what is the best alternative in this particular use-case?
Personally I think a raw pointer (or reference) is okay here. Smart pointers are concerned with managing the lifetime of the object pointed to, and in this case MyMapper isn't managing the lifetime of that object, MyClass is. You also shouldn't have a smart pointer pointing to an object that was not dynamically allocated (which the hash map isn't in this case).
Personally, I'd use something like the following:
class MyMapper
{
public:
MyMapper(HashMap<string, Id> &map)
: _map(map)
{
}
private:
HashMap<string, Id> &_map
};
Note that this will prevent MyMapper from having an assignment operator, and it can only work if it's acceptable to pass the HashMap in the constructor; if that is a problem, I'd make the member a pointer (though I'd still pass the argument as a reference, and do _map(&map) in the initializer list).
If it's possible for MyMapper or any other class using the hash map to outlive MyClass, then you'd have to start thinking about smart pointers. In that case, I would probably recommend std::shared_ptr, but you'd have to use it everywhere: _hugeIdMap would have to be a shared_ptr to a dynamically allocated value, not a regular non-pointer field.
Update:
Since you said that using a reference is not acceptable due to the project's coding standards, I would suggest just sticking with a raw pointer for the reasons mentioned above.
Naked pointers (normally referred to as raw pointers) are just fine when the object has no responsibility to delete the object. In the case of MyMapper then the pointer points to an object already owned by MyClass and is therefore absolutely fine to not delete it. The problem arises when you use raw pointers when you do intend for objects to be deleted through them, which is where problems lie. People only ask questions when they have problems, which is why you almost always see it only used in a problematic context, but raw pointers in a non-owning context is fine.
How about passing it into the constructor and keeping a reference (or const-reference) to it? That way your intent of not owning the object is made clear.
Passing auto-pointers or shared-pointers are mostly for communicating ownership.
shared pointers indicate it's shared
auto-pointers indicate it's the receivers responsibility
references indicate it's the senders responsibility
blank pointers indicate nothing.
About your coding style:
our coding standards have a convention that says never pass non-const references.
Whether you use the C++ reference mechanism or the C++ pointer mechanism, you're passing a (English-meaning) reference to the internal storage that will change. I think your coding standard is trying to tell you not to do that at all, not so much that you can't use references to do so but that you can do it in another way.