Should I avoid implicit behavior by passing a redundant parameter? - c++

I'm developing a class that, among other things, will create a block of memory and perform some processing on this memory. The user will do something like this:
MyClass m;
float* data = m.createData();
/* user writes to `data` ... */
m.processData();
Keep in mind that createData() will be called only once, MyClass keeps an internal pointer to data, and all processData() calls will always act on this data.
My question is about the signature of the processData() method. I'm a bit uncomfortable with the fact that processData() implicitly modifies data. Should I require data to be passed as a parameter (even being redundant) just to make this behavior explicit to the user?
MyClass m;
float* data = m.createData();
/* user writes to `data` ... */
m.processData(data);

I'm a bit uncomfortable with the fact that processData() implicitly modifies data.
Actually, processData doesn't modify data at all. Given your description, it modifies the object that data points to. But only because the internal pointer happens to point to the same object.
Should I require data to be passed as a parameter (even being redundant) just to make this behavior explicit to the user?
If you intend to use the internal pointer anyway, then definitely not. Requiring user to pass an argument that's not used would be very confusing.
If you intend to use the passed pointer instead, then it would not make much sense to store the pointer within the class.
user needs to access this data (write and read).
The object oriented approach is to not return the pointer to the data, but instead write member functions to MyClass that perform the writing and reading.
A non-object-oriented approach is fine as well: Replace createData and processData with free functions, return a std::unque_ptr<float[]> (or better yet, use std::vector) to the data, and get rid of MyClass entirely.

No, you should not.
Your function createData() returns a pointer. This implies, that the data to which the pointer points might be modified.
If you want to pass a copy of your data to the user, don't pass a pointer (but it sounds like you don`t want that).
processData() is a member function of your MyClass - as long as it is not marked const you don't give any guarantees that you don't modify the internal data.
The only point where i would watch out is, if you want to invalidate the pointer you passed to the user. Then you should explicitly tell about this in your documentation or even switch to some kind of smart pointer.
If this approach seems to unsafe for you, it might be a better way to not return a pointer, but instead provide member functions to do the read/write access. This gives you full control about how and when the user changes your data, e.g. for thread-safety.

Related

Is this the right way to return a struct in a parameter?

I made the following method in a C++/CLI project:
void GetSessionData(CDROM_TOC_SESSION_DATA& data)
{
auto state = CDROM_TOC_SESSION_DATA{};
// ...
data = state;
}
Then I use it like this in another method:
CDROM_TOC_SESSION_DATA data;
GetSessionData(data);
// do something with data
It does work, returned data is not garbage, however there's something I don't understand.
Question:
C++ is supposed to clean up state when it has exitted its scope, so data is a copy of state, correct ?
And in what exactly it is different from the following you see on many examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data); // signature should be GetSession(CDROM_TOC_SESSION_DATA *data)
Which one makes more sense to use or is the right way ?
Reference:
CDROM_TOC_SESSION_DATA
Using a reference vs a pointer for an out parameter is really more of a matter of style. Both function equally well, but some people feel that the explicit & when calling a function makes it more clear that the function may modify the parameter it was passed.
i.e.
doAThing(someObject);
// It's not clear that doAThing accepts a reference and
// therefore may modify someObject
vs
doAThing(&someObject);
// It's clear that doAThing accepts a pointer and it's
// therefore possible for it to modify someOjbect
Note that 99% of the time the correct way to return a class/struct type is to just return it. i.e.:
MyType getObject()
{
MyType object{};
// ...
return object;
}
Called as
auto obj = getObject();
In the specific case of CDROM_TOC_SESSION_DATA it likely makes sense to use an out parameter, since the class contains a flexible array member. That means that the parameter is almost certainly a reference/pointer to the beginning of some memory buffer that's larger than sizeof(CDROM_TOC_SESSION_DATA), and so must be handled in a somewhat peculiar way.
C++ is supposed to clean up state when it has exitted its scope, so
data is a copy of state, correct ?
In the first example, the statement
data = state
presumably copies the value of state into local variable data, which is a reference to the same object that is identified by data in the caller's scope (because those are the chosen names -- they don't have to match). I say "presumably" because in principle, an overridden assignment operator could do something else entirely. In any library you would actually want to use, you can assume that the assignment operator does something sensible, but it may be important to know the details, so you should check.
The lifetimes of local variables data and state end when the method exits. They will be cleaned up at that point, and no attempt may be made to access them thereafter. None of that affects the caller's data object.
And in what exactly it is different from the following you see on many
examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data);
Not much. Here the caller passes a pointer instead of a reference. GetSessionData must be declared appropriately for that, and its implementation must explicitly dereference the pointer to access the caller's data object, but the general idea is the same for most intents and purposes. Pointer and reference are similar mechanisms for indirect access.
Which one makes more sense to use or is the right way ?
It depends. Passing a reference is generally a bit more idiomatic in C++, and it has the advantage that the method does not have to worry about receiving a null or invalid pointer. On the other hand, passing a pointer is necessary if the function has C linkage, or if you need to accommodate the possibility of receiving a null pointer.

Is it save to assume sqlite3_exec database pointer does not change its value?

The declaration of sqlite3_exec is using a non-const pointer to non-const sqlite3 object
sqlite3_exec( sqlite3* db, const char* command, ... )
This is reasonable, since the function will need a pointer to traverse the database and the data within can be modified by the command passed to sqlite3_exec.
However, is it save to assume, that sqlite3_exec does always return with the sqlite3* db storing the same address as before? Even if errors occur?
The reason why this question arises, is because I try to write a C++ wrapper, using RAII (most likely following the rule of zero). The most natural way to represent the pointer to the database is therefore a std::unique_ptr. Obviously I cannot pass it directly to sqlite3_exec, but I could do this:
sqlite3_exec( myUniquePointer.get(), ... );
Alternatively one could release the pointer and transfer the ownership back to the unique_ptr, but this is less elegant. So the way via get() would be preferred, but therefore the pointer would not be allowed to have a different state after the execution, because the unique_ptr could not track it and would point to an inappropriate address.
You're confused about the way the language works. It is not possible for a function that takes a pointer-to-anything argument to modify the pointer itself.
No matter what sqlite3_exec does, therefore, the value of db in the caller (the "address") will be unchanged.
(The functions sqlite3_close and sqlite3_close_v2 will invalidate the database pointer, by deallocating the memory that it points to, but even then, the bit representation of the pointer is unchanged and it's possible for a correct program to observe that fact.)

Should I use pointers or move semantics for passing big chunks of data?

I have a questions about recommended coding technique. I have a tool for model analysis and I sometimes need to pass a big amount of data (From a factory class to one that holds multiple heterogeneous chunks).
My question is whether there is some consensus about if I should rather use pointers or move the ownership (I need to avoid copying when possible as the size of a data-block may be as big as 1 GB).
The pointer version would look like this:
class FactoryClass {
...
public:
static Data * createData() {
Data * data = new Data;
...
return data;
}
};
class StorageClass {
unique_ptr<Data> data_ptr;
...
public:
void setData(Data * _data_ptr) {
data_ptr.reset(_data_ptr);
}
};
void pass() {
Data * data = FactoryClass::createData();
...
StorageClass storage;
storage.setData(data);
}
Whereas the move version is like this:
class FactoryClass {
...
public:
static Data createData() {
Data data;
...
return data;
}
};
class StorageClass {
Data data;
...
public:
void setData(Data _data) {
data = move(_data);
}
};
void pass() {
Data data = FactoryClass::createData();
...
StorageClass storage;
storage.setData(move(data));
}
I like the move version better - yes, I need to add move commands to the main code, but then I in the end have just the objects in the storage and I do not have to care about pointer semantics anymore.
However I am not quite relaxed when using the move semantics whom I do not understand in detail. (I do not care about the C++11 requirement though, as the code is already only Gcc4.7+ compilable).
Would someone have a reference that would support either version? Or is there some other, preferred version of how to pass data?
I was not able to Google anything as the keywords usually led to other topics.
Thanks.
EDIT NOTE:
The second example got refactored to incorporate suggestions from the comments, the semantics remained unchanged.
When you are passing an object to a function, what you pass depends in part on how that function is going to use it. A function can use an object in one of three general ways:
It can simply reference the object for the duration of the function call, with the calling function (or it's eventual parent up the call stack) maintaining ownership of the object. The reference in this case may be a constant reference or a modifiable reference. The function will not store this object long-term.
It can copy the object directly. It doesn't gain ownership of the original, but it does acquire a copy of the original, so as to store, modify, or do with the copy what it will. Note that the difference between #1 and this is that the copy is made explicit in the parameter list. For example, taking a std::string by value. But this could also be as simple as taking an int by value.
It can gain some form of ownership of the object. The function then has some responsibility over the object's destruction. This also allows the function to store the object long-term.
My general recommendation for the parameter types for these paradigms are as follows:
Take the object by an explicit language reference where possible. If that's not possible, try a std::reference_wrapper. If that can't work, and no other solutions seem reasonable, then use a pointer. A pointer would be for things like optional parameters (though C++14's std::optional will make that less useful. Pointers will still have uses though), language arrays (though again, we have objects that cover most of the uses of these), and so forth.
Take the object by value. That one's pretty non-negotiable.
Take the object either by value-move (ie: move it into a by-value parameter) or by a smart-pointer to the object (which will also be taken by value, since you're going to copy/move it anyway). The problem with your code is that you're transferring ownership via a pointer, but with a raw pointer. Raw pointers have no ownership semantics. The moment you allocate any pointer, you should immediately wrap it in some kind of smart pointer. So your factory function should have returned a unique_ptr.
Your case appears to be #3. Which you use between value-move and smart pointer is entirely up to you. If you have to heap allocate Data for some reason, then the choice is pretty much made for you. If Data can be stack allocated, then you have some options.
I would generally do this based on an estimation of Data's internal size. If internally, it's just a few pointers/integers (and by "few", I mean like 3-4), then putting it on the stack is fine.
Indeed, it can better because you'll have less chance of a double-cache-miss. If your Data functions often just access data from another pointer, if you store Data by pointer, then every function call on it will have to dereference your stored pointer to fetch the internal one, then dereference the internal one. That's two potential cache misses, since neither pointer has any locality with StorageClass.
If you store Data by value, it's much more likely that Data's internal pointer will already be in the cache. It has better locality with StorageClass's other members; if you accessed some of StorageClass before now, you already paid for a cache miss, so you are likely to already have Data in the cache.
But movement is not free. It's cheaper than a full copy, but it's not free. You're still copying the internal data (and possibly nulling out any pointers on the original). But then again, allocating memory on the heap isn't free either. Nor is deallocating it.
But then again, if you're not moving it around very often (you move it around to get it to its final location, but little more after that), even moving a larger object would be fine. If you're using it more than you're moving it, then the cache locality of the object's storage will probably win out over the cost of moving.
There ultimately aren't a lot of technical reasons to pick one or the other. I would say to default to movement where reasonable.

How to pass std::unique_ptr around?

I am having my first attempt at using C++11 unique_ptr; I am replacing a polymorphic raw pointer inside a project of mine, which is owned by one class, but passed around quite frequently.
I used to have functions like:
bool func(BaseClass* ptr, int other_arg) {
bool val;
// plain ordinary function that does something...
return val;
}
But I soon realized that I wouldn't be able to switch to:
bool func(std::unique_ptr<BaseClass> ptr, int other_arg);
Because the caller would have to handle the pointer ownership to the function, what I don't want to. So, what is the best solution to my problem?
I though of passing the pointer as reference, like this:
bool func(const std::unique_ptr<BaseClass>& ptr, int other_arg);
But I feel very uncomfortable in doing so, firstly because it seems non instinctive to pass something already typed as _ptr as reference, what would be a reference of a reference. Secondly because the function signature gets even bigger. Thirdly, because in the generated code, it would be necessary two consecutive pointer indirections to reach my variable.
If you want the function to use the pointee, pass a reference to it. There's no reason to tie the function to work only with some kind of smart pointer:
bool func(BaseClass& base, int other_arg);
And at the call site use operator*:
func(*some_unique_ptr, 42);
Alternatively, if the base argument is allowed to be null, keep the signature as is, and use the get() member function:
bool func(BaseClass* base, int other_arg);
func(some_unique_ptr.get(), 42);
The advantage of using std::unique_ptr<T> (aside from not having to remember to call delete or delete[] explicitly) is that it guarantees that a pointer is either nullptr or it points to a valid instance of the (base) object. I will come back to this after I answer your question, but the first message is DO use smart pointers to manage the lifetime of dynamically allocated objects.
Now, your problem is actually how to use this with your old code.
My suggestion is that if you don't want to transfer or share ownership, you should always pass references to the object. Declare your function like this (with or without const qualifiers, as needed):
bool func(BaseClass& ref, int other_arg) { ... }
Then the caller, which has a std::shared_ptr<BaseClass> ptr will either handle the nullptr case or it will ask bool func(...) to compute the result:
if (ptr) {
result = func(*ptr, some_int);
} else {
/* the object was, for some reason, either not created or destroyed */
}
This means that any caller has to promise that the reference is valid and that it will continue to be valid throughout the execution of the function body.
Here is the reason why I strongly believe you should not pass raw pointers or references to smart pointers.
A raw pointer is only a memory address. Can have one of (at least) 4 meanings:
The address of a block of memory where your desired object is located. (the good)
The address 0x0 which you can be certain is not dereferencable and might have the semantics of "nothing" or "no object". (the bad)
The address of a block of memory which is outside of the addressable space of your process (dereferencing it will hopefully cause your program to crash). (the ugly)
The address of a block of memory which can be dereferenced but which doesn't contain what you expect. Maybe the pointer was accidentally modified and now it points to another writable address (of a completely other variable within your process). Writing to this memory location will cause lots of fun to happen, at times, during the execution, because the OS will not complain as long as you are allowed to write there. (Zoinks!)
Correctly using smart pointers alleviates the rather scary cases 3 and 4, which are usually not detectable at compile time and which you generally only experience at runtime when your program crashes or does unexpected things.
Passing smart pointers as arguments has two disadvantages: you cannot change the const-ness of the pointed object without making a copy (which adds overhead for shared_ptr and is not possible for unique_ptr), and you are still left with the second (nullptr) meaning.
I marked the second case as (the bad) from a design perspective. This is a more subtle argument about responsibility.
Imagine what it means when a function receives a nullptr as its parameter. It first has to decide what to do with it: use a "magical" value in place of the missing object? change behavior completely and compute something else (which doesn't require the object)? panic and throw an exception? Moreover, what happens when the function takes 2, or 3 or even more arguments by raw pointer? It has to check each of them and adapt its behavior accordingly. This adds a whole new level on top of input validation for no real reason.
The caller should be the one with enough contextual information to make these decisions, or, in other words, the bad is less frightening the more you know. The function, on the other hand, should just take the caller's promise that the memory it is pointed to is safe to work with as intended. (References are still memory addresses, but conceptually represent a promise of validity.)
I agree with Martinho, but I think it is important to point out the ownership semantics of a pass-by-reference. I think the correct solution is to use a simple pass-by-reference here:
bool func(BaseClass& base, int other_arg);
The commonly accepted meaning of a pass-by-reference in C++ is like as if the caller of the function tells the function "here, you can borrow this object, use it, and modify it (if not const), but only for the duration of the function body." This is, in no way, in conflict with the ownership rules of the unique_ptr because the object is merely being borrowed for a short period of time, there is no actual ownership transfer happening (if you lend your car to someone, do you sign the title over to him?).
So, even though it might seem bad (design-wise, coding practices, etc.) to pull the reference (or even the raw pointer) out of the unique_ptr, it actually is not because it is perfectly in accordance with the ownership rules set by the unique_ptr. And then, of course, there are other nice advantages, like clean syntax, no restriction to only objects owned by a unique_ptr, and so.
Personally, I avoid pulling a reference from a pointer/smart pointer. Because what happens if the pointer is nullptr? If you change the signature to this:
bool func(BaseClass& base, int other_arg);
You might have to protect your code from null pointer dereferences:
if (the_unique_ptr)
func(*the_unique_ptr, 10);
If the class is the sole owner of the pointer, the second of Martinho's alternative seems more reasonable:
func(the_unique_ptr.get(), 10);
Alternatively, you can use std::shared_ptr. However, if there's one single entity responsible for delete, the std::shared_ptr overhead does not pay off.

private object pointer vs object value, and returning object internals

Related to: C++ private pointer "leaking"?
According to Effective C++ (Item 28), "avoid returning handles (references, pointers, or iterators) to object internals. It increases encapsulation, helps const member functions act const, and minimizes the creation of dangling handles."
Returning objects by value is the only way I can think of to avoid returning handles. This to me suggests I should return private object internals by value as much as possible.
However, to return object by value, this requires the copy constructor which goes against the Google C++ Style Guide of "DISALLOW_COPY_AND_ASSIGN" operators.
As a C++ newbie, unless I am missing something, I find these two suggestions to conflict each other.
So my questions are: is there no silver bullet which allows efficient reference returns to object internals that aren't susceptible to dangling pointers? Is the const reference return as good as it gets? In addition, should I not be using pointers for private object fields that often? What is a general rule of thumb for choosing when to store private instance fields of objects as by value or by pointer?
(Edit) For clarification, Meyers' example dangling pointer code:
class Rectangle {
public:
const Point& upperLeft() const { return pData->ulhc; }
const Point& lowerRight() const { return pData->lrhc; }
...
};
class GUIObject { ... };
const Rectangle boundingBox(const GUIObject& obj);
If the client creates a function with code such as:
GUIObject *pgo; // point to some GUIObject
const Point *pUpperLeft = &(boundingBox(*pgo).upperLeft());
"The call to boundingBox will return a new, temporary Rectangle object [(called temp from here.)] upperLeft will then be called on temp, and that call will return a reference to an internal part of temp, in particular, to one of the Points making it up...at the end of the statement, boundingBox's return value temp will be destroyed, and that will indirectly lead to the destruction of temp's Points. That, in turn, will leave pUpperLeft pointing to an object that no longer exists." Meyers, Effective C++ (Item 28)
I think he is suggesting to return Point by value instead to avoid this:
const Point upperLeft() const { return pData->ulhc; }
The Google C++ style guide is, shall we say, somewhat "special" and has led to much discussion on various C++ newsgroups. Let's leave it at that.
Under normal circumstances I would suggest that following the guidelines in Effective C++ is generally considered to be a good thing; in your specific case, returning an object instead of any sort of reference to an internal object is usually the right thing to do. Most compilers are pretty good at handling large return values (Google for Return Value Optimization, pretty much every compiler does it).
If measurements with a profiler suggest that returning a value is becoming a bottleneck, then I would look at alternative methods.
First, let's look at this statement in context:
According to Effective C++ (Item 28),
"avoid returning handles (references,
pointers, or iterators) to object
internals. It increases encapsulation,
helps const member functions act
const, and minimizes the creation of
dangling handles."
This is basically talking about a class's ability to maintain invariants (properties that remain unchanged, roughly speaking).
Let's say you have a button widget wrapper, Button, which stores an OS-specific window handle to the button. If the client using the class had access to the internal handle, they could tamper with it using OS-specific calls like destroying the button, making it invisible, etc. Basically by returning this handle, your Button class sacrifices any control it originally had over the button handle.
You want to avoid these situations in such a Button class by providing everything you can do with the button as methods in this Button class. Then you don't need to ever return a handle to the OS-specific button handle.
Unfortunately, this doesn't always work in practice. Sometimes you have to return the handle or pointer or some other internal by reference for various reasons. Let's take boost::scoped_ptr, for instance. It is a smart pointer designed to manage memory through the internal pointer it stores. It has a get() method which returns this internal pointer. Unfortunately, that allows clients to do things like:
delete my_scoped_ptr.get(); // wrong
Nevertheless, this compromise was required because there are many cases where we are working with C/C++ APIs that require regular pointers to be passed in. Compromises are often necessary to satisfy libraries which don't accept your particular class but does accept one of its internals.
In your case, try to think if your class can avoid returning internals this way by instead providing functions to do everything one would want to do with the internal through your public interface. If not, then you've done all you can do; you'll have to return a pointer/reference to it but it would be a good habit to document it as a special case. You should also consider using friends if you know which places need to gain access to the class's internals in advance; this way you can keep such accessor methods private and inaccessible to everyone else.
Returning objects by value is the only
way I can think of to avoid returning
handles. This to me suggests I should
return private object internals by
value as much as possible.
No, if you can return a copy, then you can equally return by const reference. The clients cannot (under normal circumstances) tamper with such internals.
It really depends on the situation. If you plan to see changes in the calling method you want to pass by reference. Remember that passing by value is a pretty heavy operation. It requires a call to the copy constructor which in essence has to allocate and store enough memory to fit size of your object.
One thing you can do is fake pass by value. What that means is pass the actual parameter by value to a method that accepts const your object. This of course means the caller does not care to see changes to your object.
Try to limit pass by value if you can unless you have to.