How to a mark a function as invalidating its argument - c++

I have a function f that accepts a vector of pointers. Once the function f finishes, these pointers are no longer valid. Note, there is no real need to change the vector itself, I just want to encourage callers not to use the pointers after the call to f. There are three possible signatures for f:
The move signature
void f(vector<void*> &&v); // because the pointers in v are no longer valid.
// This signature also allows me to have f call clear() on v.
The const signature
void f(const vector<void*> &v); // because the pointers in v are no longer valid,
// but we don't have to change the vector v.
The pointer signature
void f(vector<void*> *v); // The functino modifies v in a predictable way
// (it clears it). A pointer is used instead of a reference so that
// calls to the function will have a '&' which clearly shows to the reader
// that this function modifies its argument. '*' is different from '&&' since '&&'
// may imply "do not use v, but it is unknown how it will be modified" while
// '*' implies a clear semantic for how v is changed.
Which signature is more idiomatic to use in C++11?

How about
void f(vector<void*> v);
And to use it:
vector<void*> myVec = /*...*/;
f(std::move(myVec));
If f logically needs ownership of a vector, this is the idiomatic way. It allows the caller to decide whether to move or copy a vector to f.
If the caller actually wants f to modify his vector (so the vector is actually an in/out argument) then this doesn't suit your needs. However in/out arguments suck. Functions should take input as arguments and return output as a return value. That's what god intended.

If you really want to do this with the type system, it is always possible to encode extra information using your own type.
template<class T>
struct invalidates_contained_pointers;
template<class T>
invalidates_contained_pointers<T>* contents_will_be_invalidated(T* ptr) {
return reinterpret_cast<invalidates_contained_pointers<T>*>(ptr);
}
void f(invalidates_contained_pointers<vector<void*>> *v){
auto pv = reinterpret_cast<vector<void*> *>(v);
// ...
}
f(contents_will_be_invalidated(&vec));
A similar approach can be used for references.

The short answer: there is no way to do this. The only thing that is 'official' is the inverse: there is a signature that promises that a function f(..) will NOT change its arguments: the const keyword.
Typically one adheres to the following:
functions that do not modify their arguments either get their arguments as copy-by-value or mark their arguments explicitly with const
arguments that are passed by non-const reference, move or pointer to non-const object should be read as "there is a fair chance this argument is modified by the called function f(...)".

As others have said, the type system doesn't allow you to indicate something like "don't use this data after this function call". What you could do:
void f(vector<void*> &v)
{
// ... use v ...
v.clear(); // encourage callers not to use the pointers after the call
}

f should clear the vector if it is deleting the pointers (or freeing whatever they are handles to). It is just pointlessly dangerous to leave the caller with a vector of intederminate values.
So f should accept the vector by non-const reference. Whether you want to make this lvalue reference or rvalue reference is up to you; but the lvalue version seems simpler.

Out of those three: vector<void*> * requires an lvalue to take the address of. const vector<void*> & allows either lvalues or rvalues to be passed in. vector<void*> && only allows rvalues to be passed in.
Based on your question, your function makes sense to be called with either lvalues or rvalues, so const vector<void*> & is the obvious choice.
There is no way to indicate through the type system that the caller should stop using the contained pointers, and you shouldn't try to indicate that through the type system. Indicate that through the documentation.

Related

Move semantics and overload

I think my understanding of rvalue references and move semantics has some holes in it.
As far as I've rvalue references understood now, I could implement a function f in two ways such that it profits from move semantics.
The first version: implement both
void f(const T& t);
void f(T&& t);
This would result in quite some redundancy, as both versions are likely to have (almost) identical implementation.
second version: implement only
void f(T t);
Calling f would result in calling either the copy or the move constructor of T.
Question.
How do the two versions compare to each other? My suspicion:
In the second version, (ownership of) dynamically allocated data may be moved by the move constructor, while non-dynamically allocated data in t and its members needs to be copied. In the first version, neither version allocates any additional memory (except the pointer behind the reference).
If this is correct, can I avoid writing the implementation of f basically twice without the drawback of the second version?
If you need to take a T&& parameter, it usually means you want to move the object somewhere and keep it. This kind of function is typically paired up with a const T& overload so it can accept both rvalues and lvalues.
In this situation, the second version (only one overload with T as a parameter) is always less efficient, but most likely not by much.
With the first version, if you pass an lvalue, the function takes a reference and then makes a copy to store it somewhere. That's one copy construction or assignment. If you pass an rvalue, the function again takes a reference and then moves the object to store it. That's one move construction or assignment.
With the second version, if you pass an lvalue, it gets copied into the parameter, and then the function can move that. If you pass an rvalue, if gets moved (assuming the type has a move constructor) into the parameter, and then the function can also move that. In both cases, that's one more move construction or assignment than with the first version.
Also note that copy elision can happen in some cases.
The benefit of the second version is that you don't need multiple overloads. With the first version, you need 2^n overloads for n parameters that you need copied or moved.
In the simple case of just one parameter, I sometimes do something like this to avoid repeating code:
void f(const T& t) {
f_impl(t);
}
void f(T&& t) {
f_impl(std::move(t));
}
// this function is private or just inaccessible to the user
template<typename U>
void f_impl(U&& t) {
// use std::forward<U>(t)
}

What in the world is T*& return type

I have been looking at vector implementations and stumbled upon a line that confuses me as a naive C++ learner.
What is T*& return type?
Is this merely a reference to a pointer?
Why would this be useful then?
link to code: https://github.com/questor/eastl/blob/56beffd7184d4d1b3deb6929f1a1cdbb4fd794fd/vector.h#L146
T*& internalCapacityPtr() EASTL_NOEXCEPT { return mCapacityAllocator.first(); }
It's a reference-to-a-pointer to a value of type T which is passed as a template argument, or rather:
There exists an instance of VectorBase<T> where T is specified by the program, T could be int, string or anything.
The T value exists as an item inside the vector.
A pointer to the item can be created: T* pointer = &this->itemValues[123]
You can then create a reference to this pointer: https://msdn.microsoft.com/en-us/library/1sf8shae.aspx?f=255&MSPPError=-2147217396
Correct
If you need to use a value "indirectly" then references-to-pointers are cheaper to use than pointer-to-pointers as the compiler/CPU doesn't need to perform a double-indirection.
http://c-faq.com/decl/spiral.anderson.html
This would be a reference to a pointer of type T. References to pointers can be a bit tricky but are used a lot with smart pointers when using a reference saves an increment to the reference counter.
Types in C++ should be read from right to left. Following this, it becomes a: Reference to a pointer of T. So your assumption is correct.
References to pointers are very useful, this is often used as an output argument or an in-out argument. Let's consider a specific case of std::swap
template <typename T>
void swap(T*& lhs, T*& rhs) {
T *tmp = rhs;
rhs = lhs;
lhs = tmp;
}
As with every type, it can be used as return value. In the state library, you can find this return type for std::vector<int *>::operator[], allowing v[0] = nullptr.
On the projects that I've worked on, I haven't seen much usages of this kind of getters that allow changing the internals. However, it does allow you to write a single method for reading and writing the value of the member.
In my opinion, I would call it a code smell as it makes it harder to understand which callers do actual modifications.
The story is off course different when returning a const reference to a member, as that might prevent copies. Though preventing the copy of a pointer doesn't add value.

How exactly do I use the functions push_back and pop_back()? I looked them up in the following liks but still don't understand

http://www.cplusplus.com/reference/vector/vector/push_back/ (C++11 Version)
What is the difference and/or advantages of void push_back (const value_type& val); & void push_back (value_type&& val) and which do you suggest I use?;
I don't understand how to fill in the arguments (const value_type& val) & (value_type&& val)
I don't understand the second sentence under the parameter section. (It's a bit too wordy for me to get). I do understand what val is though
It doesn't give an example I can understand real well. Can I get other examples using vectors or some video links that explain the use of the function in practice better?
http://www.cplusplus.com/reference/vector/vector/pop_back/
It doesn't give an example I can understand real well. Can I get other examples using vectors or some video links that explain the use of the function in practice better?
If you are a beginner, just read over the additional qualifiers like const, & and &&. The methods in the STL are implemented in a way, that they behave consistent over all overloads:
I will give you a small example here:
std::vector<int> myvector;
myvector.push_back(5);
int five = 5;
myvector.push_back(five);
Now the more in depth part of the answer:
First (const value_type& val). The & character signals, that we take the argument by reference, that means we don't copy the argument, but get a fancy pointer, that will behave like the object itself.
You may not want, that your variable is changed, if you push it back to a vector. To get a promise, by the programmer of the STL, that he will not change your variable while pushing it back to the vector, he can add the const before the type.
The reason it is implemented that way, is that it may prevent an unneeded copy. (First copy the argument onto the stack to call push_back and the second time copy it at the position in the vector. The first copy is unnecessary and saved by the const reference.)
This is all nice and simple, but there are cases, where the compiler is not allowed to take a reference of a value and pass it to a function. In case of temporary values, there is no reference to take, because there is no variable in memory. Take the following line for example.
myvector.push_back(5);
Since the 5 has no address, it can't be passed as a reference. The compiler can not use the first overload of the function. But the programmer also does not want to waste the time for the copy onto the stack. That is why C++11 added new semantic. A so called rvalue for such temporary objects. If you want to write a function to take such an rvalue, you can do so by using type&& rvalue_variable. The value in this case the 5 is moved onto the stack by using the move constructor of the type. For trivial types like int, this will be the same as the copy constructor. For complex types like std::vector there are shortcuts one can take if one is allowed to rip the temporary object apart. In case of the vector, it does not need to copy all the data in the vector to a new location, but can use the pointer of the old vector in the new object.
Now we can look at the example again:
std::vector<int> myvector;
myvector.push_back(5); // push_back(const int&) can't be applied. The compiler chooses push_back(int&&) for us
int five = 5;
myvector.push_back(five); // push_back(const int&) can be applied and is used by the compiler
// The resulting vector after this has the two values [5, 5]
// and we see, that we don't need to care about it.
This should show you how you can use both of them.
push_back():
std::vector<int> vec = { 0, 1, 2 };
vec.push_back(3);
pop_back():
vec.pop_back();
vec.pop_back();
If you need more clarification:
push_back(const T& val) adds its parameter to the end of the vector, effectively increasing the size by 1 iff the vector capacity will be exceeded by its size.
pop_back() doesn't take any parameters and removes the last element of the vector, effectively reducing the size by 1.
Update:
I'm trying to tackle your questions one by one, if there is anything unclear, let me know.
What is the difference and/or advantages of void push_back (const value_type& val); & void push_back (value_type&& val) and which do you suggest I use?;
Prior to C++11, rvalue-references didn't exist. That's why push_back was implemented as vector.push_back(const value_type& val). If you have a compiler that supports C++11 or later, std::vector.push_back() will be overloaded for const lvalue references and rvalue references.
I don't understand how to fill in the arguments (const value_type& val) & (value_type&& val)
You as a programmer do NOT choose how you pass arguments to push_back(), the compiler does it for you automagically, in most cases.
I don't understand the second sentence under the parameter section. (It's a bit too wordy for me to get). I do understand what val is though
value_type is equal to the type of vector that you declared. If a vector is declared with std::string, then it can only hold std::string.
std::vector<std::string> vec;
vec.push_back("str"); // Ok. "str" is allowed.
vec.push_back(12); // Compile-time error. 12 is not allowed.
What is the difference and/or advantages of void push_back (const value_type& val); & void push_back (value_type&& val) and which do you suggest I use?
void push_back(const value_type&) takes an argument, that is then copied into the vector. This means that a new element is initialized as a copy of the passed argument, as defined by an appropriate allocator.
void push_back(value_type&&) takes an argument, that is then moved into the container (this type of expressions are called rvalue expressions).
The usage of either of two depends on the results you want to achieve.
I don't understand how to fill in the arguments (const value_type& val) & (value_type&& val)
In most cases you shouldn't think about which version to use, as compiler will take care of this for you. Second version will be called for any rvalue argument and the first one for the rest. In a rare case when you want to ensure the second overload is called you can use std::move to explicitly convert the argument expression into xvalue (which is a kind of rvalues).
I don't understand the second sentence under the parameter section. (It's a bit too wordy for me to get). I do understand what val is though
The sentence in question is:
Member type value_type is the type of the elements in the container, defined in vector as an alias of its first template parameter (T).
This means that value_type is the same type as the type of vector's elements. E.g., if you have vector<int> then value_type is the same as int and for vector<string> the value_type is string.
Because vector is not an ordinary type, but a template, you must specify a type parameters (which goes into angle brackets <> after vector) when defining a variable. Inside the vector template specification this type parameter T is then aliased with value_type:
typedef T value_type;
It doesn't give an example I can understand real well. Can I get other examples using vectors or some video links that explain the use of the function in practice better?
The main thing you need to remember is that vector behaves like a simple array, but with dynamicly changeable size and some additional information, like its length. push_back is simply a function that adds a new element at the end of this pseudo-array. There is, of course, a lot of subtle details, but they are inconsequential in most of the cases.
The basic usage is like this:
vector<int> v; // v is empty
v.push_back(1); // v now contains one element
vector<float> v2 { 1.0, 2.0 }; // v2 is now a vector with two elements
float f = v2.pop_back(); // v2 now has one element, and f is now equals 2.0
The best way to understand how it works is to try using it yourself.

Is it costly to pass an initializer_list as a list by value?

I want to pass a std::list as a parameter to fn(std::list<int>), so I do fn({10, 21, 30}) and everybody is happy.
However, I've come to learn that one shouldn't pass list by value, cause it's costly. So, I redefine my fn as fn(std::list<int> &). Now, when I do the call fn({10, 21, 30}), I get an error: candidate function not viable: cannot convert initializer list argument to 'std::list<int> &'.
QUESTION TIME
Is the "you shall not pass an costly object by value" rule valid here? We aren't passing a list after all, but an initializer_list, no?
If the rule still applies, what's the easy fix here?
I guess my doubt comes from the fact that I don't know clearly what happens when one passes an initializer_list argument to a function that accepts a list.
Is list generated on the spot and then passed by value? If not, what is it that actually happens?
However, I've come to learn that one shouldn't pass list by value, cause it's costly.
That's not entirely accurate. If you need to pass in a list that the function can modify, where the modifications shouldn't be externally visible, you do want to pass a list by value. This gives the caller the ability to choose whether to copy or move from an existing list, so gives you the most reasonable flexibility.
If the modifications should be externally visible, you should prevent temporary list objects from being passed in, since passing in a temporary list object would prevent the caller from being able to see the changes made to the list. The flexibility to silently pass in temporary objects is the flexibility to shoot yourself in the foot. Don't make it too flexible.
If you need to pass in a list that the function will not modify, then const std::list<T> & is the type to use. This allows either lvalues or rvalues to be passed in. Since there won't be any update to the list, there is no need for the caller to see any update to the list, and there is no problem passing in temporary list objects. This again gives the caller the most reasonable flexibility.
Is the "you shall not pass an costly object by value" rule valid here? We aren't passing a list after all, but an initializer_list, no?
You're constructing a std::list from an initializer list. You're not copying that std::list object, but you are copying the list items from the initializer list to the std::list. If the copying of the list items is cheap, you don't need to worry about it. If the copying of the list items is expensive, then it should be up to the caller to construct the list in some other way, it still doesn't need to be something to worry about inside your function.
If the rule still applies, what's the easy fix here?
Both passing std::list by value or by const & allow the caller to avoid pointless copies. Which of those you should use depends on the results you want to achieve, as explained above.
Is list generated on the spot and then passed by value? If not, what is it that actually happens?
Passing the list by value constructs a new std::list object in the location of the function parameter, using the function argument to specify how to construct it. This may or may not involve a copy or a move of an existing std::list object, depending on what the caller specifies as the function argument.
The expression {10, 21, 30} will construct a initializer_list<int>
This in turn will be used to create a list<int>
That list will be a temporary and a temporarys will not bind to a
non-const reference.
One fix would be to change the prototype for you function to
fn(const std::list<int>&)
This means that you can't edit it inside the function, and you probably don't need to.
However, if you must edit the parameter inside the function, taking it by value would be appropriate.
Also note, don't optimize prematurely, you should always use idiomatic
constructs that clearly represents what you want do do, and for functions,
that almost always means parameters by const& and return by value.
This is easy to use right, hard to use wrong, and almost always fast enough.
Optimization should only be done after profiling, and only for the parts of the program that you have measured to need it.
Quoting the C++14 standard draft, (emphasis are mine)
18.9 Initializer lists [support.initlist]
2: An object of type initializer_list provides access to an array of
objects of type const E. [ Note: A pair of pointers or a pointer plus
a length would be obvious representations for initializer_list.
initializer_list is used to implement initializer lists as specified
in 8.5.4. Copying an initializer list does not copy the underlying
elements. —end note ]
std::list has a constructor which is used to construct from std::initializer_list. As you can see, it takes it by value.
list(initializer_list<T>, const Allocator& = Allocator());
If you are never going to modify your parameter, then fn(const std::list<int>&) will do just fine. Otherwise, fn(std::list<int>) will suffice well for.
To answer your questions:
Is the "you shall not pass an costly object by value" rule valid here?
We aren't passing a list after all, but an initializer_list, no?
std::initializer_list is not a costly object. But std::list<int> surely sounds like a costly object
If the rule still applies, what's the easy fix here?
Again, it's not costly
Is list generated on the spot and then passed by value? If not, what is it that actually happens?
Yes, it is... your list object is created on the spot at run-time right before the program enters your function scope
However, I've come to learn that one shouldn't pass list by value, cause it's costly. So, I redefine my fn as fn(std::list &). Now, when I do the call fn({10, 21, 30}), I get an error: candidate function not viable: cannot convert initializer list argument to 'std::list &'.
A way to fix the problem would be:
fn(std::list<int>& v) {
cout << v.size();
}
fn(std::list<int>&& v) {
fn(v);
}
Now fn({1, 2, 3 }); works as well (it will call the second overloaded function that accepts a list by rvalue ref, and then fn(v); calls the first one that accepts lvalue references.
fn(std::list<int> v)
{
}
The problem with this function is that it can be called like:
list<int> biglist;
fn(biglist);
And it will make a copy. And it will be slow. That's why you want to avoid it.
I would give you the following solutions:
Overloaded your fn function to accept both rvalues and lvalues
properly as shown before.
Only use the second function (the one that accepts only rvalue
references). The problem with this approach is that will throw a compile error even if it's called with a lvalue reference, which is something you want to allow.
Like the other answers and comments you can use a const reference to the list.
void fn(const std::list<int>& l)
{
for (auto it = l.begin(); it != l.end(); ++it)
{
*it; //do something
}
}
If this fn function is heavily used and you are worried about the overhead of constructing and destructing the temporary list object, you can create a second function that receives the initializer_list directly that doesn't involve any copying whatsoever. Using a profiler to catch such a performance hot spot is not trivial in many cases.
void fn(const std::initializer_list<int>& l)
{
for (auto it = l.begin(); it != l.end(); ++it)
{
*it; //do something
}
}
You can have std::list<> because in fact you're making temporary list and passing initializer_list by value is cheap. Also accessing that list later can be faster than a reference because you avoid dereferencing.
You could hack it by having const& std::list as parameter or like that
void foo( std::list<int> &list ) {}
int main() {
std::list<int> list{1,2,3};
foo( list );
}
List is created on function scope and this constructor is called
list (initializer_list<value_type> il,
const allocator_type& alloc = allocator_type())
So there's no passing list by value. But if you'll use that function and pass list as parameter it'll be passed by value.

Overloading operator [] for a sparse vector

I'm trying to create a "sparse" vector class in C++, like so:
template<typename V, V Default>
class SparseVector {
...
}
Internally, it will be represented by an std::map<int, V> (where V is the type of value stored). If an element is not present in the map, we will pretend that it is equal to the value Default from the template argument.
However, I'm having trouble overloading the subscript operator, []. I must overload the [] operator, because I'm passing objects from this class into a Boost function that expects [] to work correctly.
The const version is simple enough: check whether the index is in the map, return its value if so, or Default otherwise.
However, the non-const version requires me to return a reference, and that's where I run into trouble. If the value is only being read, I do not need (nor want) to add anything to the map; but if it's being written, I possibly need to put a new entry into the map. The problem is that the overloaded [] does not know whether a value is being read or written. It merely returns a reference.
Is there any way to solve this problem? Or perhaps to work around it?
There may be some very simple trick, but otherwise I think operator[] only has to return something which can be assigned from V (and converted to V), not necessarily a V&. So I think you need to return some object with an overloaded operator=(const V&), which creates the entry in your sparse container.
You will have to check what the Boost function does with its template parameter, though - a user-defined conversion to V affects what conversion chains are possible, for example by preventing there being any more user-defined conversions in the same chain.
Don't let the non-const operator& implementation return a reference, but a proxy object. You can then implement the assignment operator of the proxy object to distinguish read accesses to operator[] from write accesses.
Here's some code sketch to illustrate the idea. This approach is not pretty, but well - this is C++. C++ programmers don't waste time competing in beauty contests (they wouldn't stand a chance either). ;-)
template <typename V, V Default>
ProxyObject SparseVector::operator[]( int i ) {
// At this point, we don't know whether operator[] was called, so we return
// a proxy object and defer the decision until later
return ProxyObject<V, Default>( this, i );
}
template <typename V, V Default>
class ProxyObject {
ProxyObject( SparseVector<V, Default> *v, int idx );
ProxyObject<V, Default> &operator=( const V &v ) {
// If we get here, we know that operator[] was called to perform a write access,
// so we can insert an item in the vector if needed
}
operator V() {
// If we get here, we know that operator[] was called to perform a read access,
// so we can simply return the existing object
}
};
I wonder whether this design is sound.
If you want to return a reference, that means that clients of the class can store the result of calling operator[] in a reference, and read from/write to it at any later time. If you do not return a reference, and/or do not insert an element every time a specific index is addressed, how could they do this? (Also, I've got the feeling that the standard requires a proper STL container providing operator[] to have that operator return a reference, but I'm not sure of that.)
You might be able to circumvent that by giving your proxy also an operator V&() (which would create the entry and assign the default value), but I'm not sure this wouldn't just open another loop hole in some case I hadn't thought of yet.
std::map solves this problem by specifying that the non-const version of that operator always inserts an element (and not providing a const version at all).
Of course, you can always say this is not an off-the-shelf STL container, and operator[] does not return plain references users can store. And maybe that's OK. I just wonder.