Coming from a Java background I am confused with how C++ allows passing objects by value. I have a conceptual doubt regarding when objects are passed by value:
void add_to_vector(vector<SomeClass>& v, SomeClass var) {
v.push_back(var);
}
Is this conceptually correct? Here is why I feel this is wrong: var is being passed by value and the memory for the object will be allocated on the stack for the function call. It is then getting added to the vector. At the end of the function call, the stack will be cleared and hence the object being referenced by var will also be cleared. So vector will now contain an object which no longer exists after the function call.
Am I missing something?
You are missing the powerful concept of value semantics. Just like var is a local copy in the function, std::vector is designed such that after v.push_back(var);, v holds a copy of var. This means that the elements of v can be used without having to worry where they came from (unless SomeClass has members with referential semantics, or in some way or another touches shared state.)
Yes, you're missing C++ value semantics. In Java, vectors only hold object references, object values themselves reside on the heap and are collected when no longer used. In C++, vectors hold object values, so practically always the vector will hold its own private value independent of function's local. Even if you passed var by reference, vector would hold its own private copy. Regard them as deep copies.
You might want to push_back(std::move(var)) here BTW, when var is passed by value in your example, if you don't plan to use the value after push_back.
Related
I made the following method in a C++/CLI project:
void GetSessionData(CDROM_TOC_SESSION_DATA& data)
{
auto state = CDROM_TOC_SESSION_DATA{};
// ...
data = state;
}
Then I use it like this in another method:
CDROM_TOC_SESSION_DATA data;
GetSessionData(data);
// do something with data
It does work, returned data is not garbage, however there's something I don't understand.
Question:
C++ is supposed to clean up state when it has exitted its scope, so data is a copy of state, correct ?
And in what exactly it is different from the following you see on many examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data); // signature should be GetSession(CDROM_TOC_SESSION_DATA *data)
Which one makes more sense to use or is the right way ?
Reference:
CDROM_TOC_SESSION_DATA
Using a reference vs a pointer for an out parameter is really more of a matter of style. Both function equally well, but some people feel that the explicit & when calling a function makes it more clear that the function may modify the parameter it was passed.
i.e.
doAThing(someObject);
// It's not clear that doAThing accepts a reference and
// therefore may modify someObject
vs
doAThing(&someObject);
// It's clear that doAThing accepts a pointer and it's
// therefore possible for it to modify someOjbect
Note that 99% of the time the correct way to return a class/struct type is to just return it. i.e.:
MyType getObject()
{
MyType object{};
// ...
return object;
}
Called as
auto obj = getObject();
In the specific case of CDROM_TOC_SESSION_DATA it likely makes sense to use an out parameter, since the class contains a flexible array member. That means that the parameter is almost certainly a reference/pointer to the beginning of some memory buffer that's larger than sizeof(CDROM_TOC_SESSION_DATA), and so must be handled in a somewhat peculiar way.
C++ is supposed to clean up state when it has exitted its scope, so
data is a copy of state, correct ?
In the first example, the statement
data = state
presumably copies the value of state into local variable data, which is a reference to the same object that is identified by data in the caller's scope (because those are the chosen names -- they don't have to match). I say "presumably" because in principle, an overridden assignment operator could do something else entirely. In any library you would actually want to use, you can assume that the assignment operator does something sensible, but it may be important to know the details, so you should check.
The lifetimes of local variables data and state end when the method exits. They will be cleaned up at that point, and no attempt may be made to access them thereafter. None of that affects the caller's data object.
And in what exactly it is different from the following you see on many
examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data);
Not much. Here the caller passes a pointer instead of a reference. GetSessionData must be declared appropriately for that, and its implementation must explicitly dereference the pointer to access the caller's data object, but the general idea is the same for most intents and purposes. Pointer and reference are similar mechanisms for indirect access.
Which one makes more sense to use or is the right way ?
It depends. Passing a reference is generally a bit more idiomatic in C++, and it has the advantage that the method does not have to worry about receiving a null or invalid pointer. On the other hand, passing a pointer is necessary if the function has C linkage, or if you need to accommodate the possibility of receiving a null pointer.
I would like to use a std::map (or prob. std::unordered_map) where i insert custom object keys and double values, e.g. std::map<CustomClass,double>.
The order of the objects does not matter, just the (fast) lookup is important. My idea is to insert the address/pointer of the object instead as that has already have a comparator defined, i.e. std::map<CustomClass*,double>
In
Pointers as keys in map C++ STL
it has been answered that this can be done but i am still a bit worried that there might be side effects that are hard to catch later.
Specifically:
Can the address of an object change during runtime of the program? And could this lead to undefined behavior for my lookup in the map?
A test program could be:
auto a = adlib::SymPrimitive();
auto b = adlib::SymPrimitive();
auto c = adlib::mul(a,b);
auto d = adlib::add(c,a);
// adlib::Assignment holds std::map which assigns values to a,b
auto assignment = adlib::Assignment({&a,&b},{4,2});
// a=4, b=2 -> c=8 -> d=12
adlib::assertEqual(d.eval_fcn(assignment), 12);
which is user code, so users could potentially put the variables into a vector etc.
Update:
The answers let me think about users potentially inserting SymPrimitives into a vector, a simple scenario would be:
std::vector<adlib::SymPrimitive> syms{a,b};
auto assignment = adlib::Assignment({&syms[0],&syms[1]},{4,2}); // not allowed
The pitfall here is that syms[0] is a copy of a and has a different address. To be aware of that i could probably make the responsibility of the user.
Can the address of an object change during runtime of the program?
No. The address of an object never changes.
However, an object can stop existing at the address where it was created when the lifetime of the object ends.
Example:
std::map<CustomClass*,double> map;
{
CustomClass o;
map.emplace(&o, 3.14);
}
// the pointer within the map is now dangling; the pointed object does not exist
Also note that some operations on come containers cause the elements of the container to occupy a new object, and the old ones are destroyed. After such operation, references (in general sense; this includes pointers and iterators) to those elements are invalid and the behaviour of attempting to access through those references is undefined.
Objects never change address during their lifetime. If all you want to do is look up some value associated with an object whose address is known at the time of the lookup, then using the address of the object as the key in a map should be perfectly safe.
(It is even safe if the object has been destroyed and/or deallocated, as long as you don't dereference the pointer and only use it as a key for looking up an item in the map. But you might want to figure out how to remove entries from the map when objects are destroyed or for other reasons shouldn't be in the map any more...)
I had been in an interview and asked to give an example or scenario in CPP where we can't proceed without pointers, means we have to use pointer necessarily.
I have given an example of function returning array whose size is not known then we need to return pointer that is name of the array which is actually a pointer. But the interviewer said its internal to array give some other example.
So please help me with some other scenarios for the same.
If you are using a C Library which has a function that returns a pointer, you have to use pointers then, not a reference.
There are many other cases (explicitly dealing with memory, for instance) - but these two came to my mind first:
linked data-structures
How: You need to reference parts of your structure in multiple places. You use pointers for that, because containers (which also use pointers internally) do not cover all your data-structure needs. For example,
class BinTree {
BinTree *left, *right;
public:
// ...
};
Why there is no alternative: there are no generic tree implementations in the standard (not counting the sorting ones).
pointer-to-implementation pattern (pimpl)
How: Your public .hpp file has the methods, but only refers to internal state via an opaque Whatever *; and your internal implementation actually knows what that means and can access its fields. See:
Is the pImpl idiom really used in practice?
Why there is no alternative: if you provide your implementation in binary-only form, users of the header cannot access internals without decompiling/reverse engineering. It is a much stronger form of privacy.
Anyplace you would want to use a reference, but have to allow for null values
This is common in libraries where if you pass a non zero pointer, it will be set to the value
It is also a convention to have arguments to a function that will be changed to use a pointer, rather than a reference to emphasize that the value can be changed to the user.
Here are some cases:
Objects with large lifetime. You created some object in function. You need this object afterwards (not even copy of it).
But if you created it without pointers, on stack - after function would finish, this object would die. So you need to create this object using dynamic memory and return pointer to it.
Stack space is not enough. You need object which needs lot of memory, hence allocating it on the stack won't fit your needs, since stack has less space than heap usually. So you need to create the object again using dynamic memory on heap and return pointer to it.
You need reference semantics. You have structure which you passed to some function and you want the function to modify this structure, in this case you need to pass a pointer to this structure, otherwise you can't modify the original structure, since copy of it will be passed to the function if you don't use pointers.
Note: in the latter case, indeed using pointer is not necessary, since you can substitute it using reference.
PS. You can browse here for more scenarios, and decide in which cases are pointer usages necessary.
pointers are important for performance example of this are for functions. originally when you pass a value in a function it copies the value from the argument and stores to the parameter
but in pointers you can indirectly access them and do what you want
I have a questions about recommended coding technique. I have a tool for model analysis and I sometimes need to pass a big amount of data (From a factory class to one that holds multiple heterogeneous chunks).
My question is whether there is some consensus about if I should rather use pointers or move the ownership (I need to avoid copying when possible as the size of a data-block may be as big as 1 GB).
The pointer version would look like this:
class FactoryClass {
...
public:
static Data * createData() {
Data * data = new Data;
...
return data;
}
};
class StorageClass {
unique_ptr<Data> data_ptr;
...
public:
void setData(Data * _data_ptr) {
data_ptr.reset(_data_ptr);
}
};
void pass() {
Data * data = FactoryClass::createData();
...
StorageClass storage;
storage.setData(data);
}
Whereas the move version is like this:
class FactoryClass {
...
public:
static Data createData() {
Data data;
...
return data;
}
};
class StorageClass {
Data data;
...
public:
void setData(Data _data) {
data = move(_data);
}
};
void pass() {
Data data = FactoryClass::createData();
...
StorageClass storage;
storage.setData(move(data));
}
I like the move version better - yes, I need to add move commands to the main code, but then I in the end have just the objects in the storage and I do not have to care about pointer semantics anymore.
However I am not quite relaxed when using the move semantics whom I do not understand in detail. (I do not care about the C++11 requirement though, as the code is already only Gcc4.7+ compilable).
Would someone have a reference that would support either version? Or is there some other, preferred version of how to pass data?
I was not able to Google anything as the keywords usually led to other topics.
Thanks.
EDIT NOTE:
The second example got refactored to incorporate suggestions from the comments, the semantics remained unchanged.
When you are passing an object to a function, what you pass depends in part on how that function is going to use it. A function can use an object in one of three general ways:
It can simply reference the object for the duration of the function call, with the calling function (or it's eventual parent up the call stack) maintaining ownership of the object. The reference in this case may be a constant reference or a modifiable reference. The function will not store this object long-term.
It can copy the object directly. It doesn't gain ownership of the original, but it does acquire a copy of the original, so as to store, modify, or do with the copy what it will. Note that the difference between #1 and this is that the copy is made explicit in the parameter list. For example, taking a std::string by value. But this could also be as simple as taking an int by value.
It can gain some form of ownership of the object. The function then has some responsibility over the object's destruction. This also allows the function to store the object long-term.
My general recommendation for the parameter types for these paradigms are as follows:
Take the object by an explicit language reference where possible. If that's not possible, try a std::reference_wrapper. If that can't work, and no other solutions seem reasonable, then use a pointer. A pointer would be for things like optional parameters (though C++14's std::optional will make that less useful. Pointers will still have uses though), language arrays (though again, we have objects that cover most of the uses of these), and so forth.
Take the object by value. That one's pretty non-negotiable.
Take the object either by value-move (ie: move it into a by-value parameter) or by a smart-pointer to the object (which will also be taken by value, since you're going to copy/move it anyway). The problem with your code is that you're transferring ownership via a pointer, but with a raw pointer. Raw pointers have no ownership semantics. The moment you allocate any pointer, you should immediately wrap it in some kind of smart pointer. So your factory function should have returned a unique_ptr.
Your case appears to be #3. Which you use between value-move and smart pointer is entirely up to you. If you have to heap allocate Data for some reason, then the choice is pretty much made for you. If Data can be stack allocated, then you have some options.
I would generally do this based on an estimation of Data's internal size. If internally, it's just a few pointers/integers (and by "few", I mean like 3-4), then putting it on the stack is fine.
Indeed, it can better because you'll have less chance of a double-cache-miss. If your Data functions often just access data from another pointer, if you store Data by pointer, then every function call on it will have to dereference your stored pointer to fetch the internal one, then dereference the internal one. That's two potential cache misses, since neither pointer has any locality with StorageClass.
If you store Data by value, it's much more likely that Data's internal pointer will already be in the cache. It has better locality with StorageClass's other members; if you accessed some of StorageClass before now, you already paid for a cache miss, so you are likely to already have Data in the cache.
But movement is not free. It's cheaper than a full copy, but it's not free. You're still copying the internal data (and possibly nulling out any pointers on the original). But then again, allocating memory on the heap isn't free either. Nor is deallocating it.
But then again, if you're not moving it around very often (you move it around to get it to its final location, but little more after that), even moving a larger object would be fine. If you're using it more than you're moving it, then the cache locality of the object's storage will probably win out over the cost of moving.
There ultimately aren't a lot of technical reasons to pick one or the other. I would say to default to movement where reasonable.
More C++ learning questions. I've been using vectors primarily with raw pointers with a degree of success, however, I've been trying to play with using value objects instead. The first issue I'm running into is compile error in general. I get errors when compiling the code below:
class FileReference {
public:
FileReference(const char* path) : path(string(path)) {};
const std::string path;
};
int main(...) {
std::vector<FileReference> files;
// error C2582: 'operator =' function is unavailable in 'FileReference'
files.push_back(FileReference("d:\\blah\\blah\\blah"));
}
Q1: I'm assuming it's because of somehow specifying a const path, and/or not defining an assignment operator - why wouldn't a default operator work? Does defining const on my object here even I'm assuming it's because I defined a const path, Does const even win me anything here?
Q2: Secondly, in a vector of these value objects, are my objects memory-safe? (meaning, will they get automatically deleted for me). I read here that vectors by default get allocated to the heap -- so does that mean I need to "delete" anything.
Q3: Thirdly, to prevent copying of the entire vector, I have to create a parameter that passes the vector as a reference like:
// static
FileReference::Query(const FileReference& reference, std::vector<FileReference>& files) {
// push stuff into the passed in vector
}
What's the standard for returning large objects that I don't want to die when the function dies. Would I benefit from using a shared_ptr here or something like that?
If any member variables are const, then a default assignment operator can't be created; the compiler doesn't know what you would want to happen. You would have to write your own operator overload, and figure out what behaviour you want. (For this reason, const member variables are often less useful than one might first think.)
So long as you're not taking ownership of raw memory or other resources, then there's nothing to clean up. A std::vector always correctly deletes its contained elements when its lifetime ends, so long as they in turn always correctly clean up their own resources. And in your case, your only member variable is a std:string, which also looks after itself. So you're completely safe.
You could use a shared pointer, but unless you do profiling and identify a bottleneck here, I wouldn't worry about it. In particular, you should read about copy elision, which the compiler can do in many circumstances.
Elements in vector must be assignable from section 23.2.4 Class template vector of the C++ standard:
...the stored object shall meet the requirements of Assignable.
Having a const member makes the class unassignable.
As the elements are being stored by value, they will be destructed when the vector is destroyed or when they are removed from the vector. If the elements were raw pointers, then they would have to be explicitly deleted.