Pass vectors by pointer and reference in C++ - c++

A quick question about how to safely pass and use vectors in c++.
I know that when using vectors you have to be very careful with addresses to them and their elements because when you dynamically change their size they may change their address (unless you use reserve etc. but I'm imagining I will not know how much space I will need).
Now I want to pass an existing vector (created elsewhere) to a function which adapts it and changes it size etc. but I'm a little unclear as to what is safe to do because I would normally achieve all of this with pointers. On top of this there is using references to the vector and this just muddies the water for me.
For instance take the two following functions and comments in them
void function1(std::vector<int>* vec){
std::cout<<"the size of the vector is: "<<vec->size()<<std::endl; //presumably valid here
for (int i=0;i<10;i++){
(*vec).pushback(i); //Is this safe? Or will this fail?
// Or: vec->pushback(i); Any difference?
}
std::cout<<"the size of the vector is: "<<vec->size()<<std::endl; //Is this line valid here??
}
AND
void function2(std::vector<int>& vec){
std::cout<<"the size of the vector is: "<<vec.size()<<std::endl; //presumably valid here
for (int i=0;i<10;i++){
vec.pushback(i); //Is this safe? Or will this fail?
}
std::cout<<"the size of the vector is: "<<vec.size()<<std::endl; //Is this line valid here??
}
Is there any difference between the two functions, both in terms of functionality and in terms of safety?
Or in other words, if I only have a pointer/reference to a vector and need to resize it how can I be sure where the vector will actually be in memory, or what the pointer to the vector really is, after I operate on it. Thanks.

In term of functionality, in the very limited context you gave us, they are essentially the same.
In more general view, if you want to write generic code, consider that operation and operators bind directly to reference, but not to pointers
a = b + c;
To compile requires
A operator+(const B&, const C&);
But
A* operator+(const B*, const C*);
is all a different beast.
Also, an expression taking reference and taking value have the same syntax, but an expression taking pointers require pointers to be deference to provide equal semantics, but this leads to different expression syntax ( *a + *b against a+b ) thus leading to "less general code".
On the counterpart, if you are writing a class that have runtime polymorphism (and lyskov substitution in mind), you will most likely treat dynamically allocated object, and hence, manipulating them through pointers may be more natural.
There are "grey areas" where the two things mesh, but -in general- pointer taking function are more frequent in runtime based OOP frameworks, while reference taking functions are more frequent in "value based generic algorithms", where static type deduction is expected, and on-stack based allocation is most likely wanted.

Related

In C++ can I treat an array of single-member unions as an array of the element?

Suppose I am writing a fixed-size array class of runtime size, somewhat equivalent to Rust's Box<[T]> in order to save the space of tracking capacity when I know the array isn't going to change size after initialization.
In order to support types which do not have a default constructor, I want to be able to allow the user to supply a generator function that takes the index of the element and produces a T. In order to do this and decouple allocation and initialization, I follow the advice in CJ Johnson's CppCon 2019 talk "How to Hold a T" and initially create the array in terms of a single-member union:
template<typename T>
union MaybeUninit
{
MaybeUninit() {}
~MaybeUninit() {}
T val;
};
// ...
m_array = new MaybeUninit<T>[size];
// initialize all elements by setting val for each item
T* items = reinterpret_cast<T*>(m_array); // is this OK and dereferenceable?
My question is, once the generator is done and all the elements of m_array are initialized, am I allowed (according to the standard, regardless of whether a given compiler implementation permits it) to use reinterpret_cast<T*>(m_array) to treat the result as an array of the actual objects (the line marked "is this OK")? If not, is there any way to get from MaybeUninit<T>* to T* without copying?
In terms of which standard, I'm mainly interested in C++17 or later.
am I allowed (according to the standard, regardless of whether a given compiler implementation permits it) to use reinterpret_cast<T*>(m_array) to treat the result as an array of the actual objects (the line marked "is this OK")?
No, any pointer arithmetic on the resulting pointer will result in UB (for indices >1) or result in a one-past the end pointer that can't be dereferenced (for index 1). Only accessing the element at index 0 this way is allowed (but needs to still be constructed).
The only way you are allowed to perform pointer arithmetic is on pointers to the elements of an array. Your pointer is not pointing to an object that is element of an array (which then for the purpose of pointer arithmetic is considered to be belong to an array of length 1).
If not, is there any way to get from MaybeUninit* to T* without copying?
The pointer conversion is not an issue, but you can't index into the resulting pointer. The only way to avoid this is to have an actual array of T objects.
Note however that you don't need to construct every element in an array of T objects. For example:
std::allocator<T> alloc;
T* ptr = std::allocator_traits<decltype(alloc)>::allocate(size);
Now ptr is a pointer to an array of size objects of type T, but no actual objects of type T in it have their lifetime started. You can construct individual elements into it with placement-new (or std::construct_at or std::allocator_traits<decltype(alloc)>::construct) and destruct them with a destructor call (or std::destroy_at or std::allocator_traits<decltype(alloc)>::destruct). You need to do this with your union approach as well anyhow. This approach also allows you to easily exchange the allocator with a different one.
There will be no overhead for size or capacity management. All of that is now responsibility of the user. Whether this is a good idea is a different question.
Instead of std::allocator or an alternative Allocator implementation you could also use other functions that allocate memory and are specified to implicitly create objects, e.g. operator new or std::malloc, etc.

range based for loop with an auto reference to a pointer

With regard to the following code I would like to have some clarification. We have an array of pointers to a class. Next we loop over the array using a range based loop. For this range based loop auto& is used. But next when we use the element a we can use the arrow operator to call a function.
This code is compiled using C++ 11.
// Definition of an array of pointers to some class.
some_class* array[10];
// The array of pointers is set.
// Loop over the array.
for(auto& a : array)
{
// Call some function using the arrow operator.
a->some_func();
}
Is my understanding correct that auto& a is a reference to a pointer? Is this not a bit over kill. Would using auto a not create a copy of the pointer and take up the same amount of memory?
Your code compiles fine.
Nevertheless, there is not really a point using a reference here, if you don't like to change it.
Best practise here is
Use const auto &T if the content shall not be changed. The reference is important, if the type T of auto is large. Otherwise you will copy the object.
Use auto & T if you like to change the content of the container you are iterating.
Is my understanding correct that auto& a is a reference to a pointer?
Yes that's correct
Would using auto a not create a copy of the pointer and take up the same amount of memory
Think of references as an alias for the variable, that is, think of it as a different name.
as for this -> not create a copy of the pointer`
A pointer is very light weight and copying a pointer is relatively cheap (that's how views are implemented, pointers to sequences). If the object underline the container you are iterating is a fundamental type or a pointer to some type, auto is enough. In cases where the underline object of a container is a heavy weight object, then auto& is a better alternative (and ofc you can add const qualifier if you don't want to modify it).

Declaring arrays in C++

I am new to C++ and currently learning it with a book by myself. This book seems to say that there are several kinds of arrays depending on how you declare it. I guess the difference between dynamic arrays and static arrays are clear to me. But I do not understand the difference between the STL std::array class and a static array.
An STL std::array variable is declared as:
std::array < int, arraySize > array1;
Whereas a static array variable is declared as:
int array1[arraySize];
Is there a fundamental difference between the two? Or is it just syntax and the two are basically the same?
A std::array<> is just a light wrapper around a C-style array, with some additional nice interface member functions (like begin, end etc) and typedefs, roughly defined as
template<typename T, size_t N>
class array
{
public:
T _arr[N];
T& operator[](size_t);
const T& operator[](size_t) const;
// other member functions and typedefs
}
One fundamental difference though is that the former can be passed by value, whereas for the latter you only pass a pointer to its first element or you can pass it by reference, but you cannot copy it into the function (except via a std::copy or manually).
A common mistake is to assume that every time you pass a C-style array to a function you lose its size due to the array decaying to a pointer. This is not always true. If you pass it by reference, you can recover its size, as there is no decay in this case:
#include <iostream>
template<typename T, size_t N>
void f(T (&arr)[N]) // the type of arr is T(&)[N], not T*
{
std::cout << "I'm an array of size " << N;
}
int main()
{
int arr[10];
f(arr); // outputs its size, there is no decay happening
}
Live on Coliru
The main difference between these two is an important one.
Besides the nice methods the STL gives you, when passing a std::array to a function, there is no decay. Meaning, when you receive the std::array in the function, it is still a std::array, but when you pass an int[] array to a function, it effectively decays to an int* pointer and the size of the array is lost.
This difference is a major one. Once you lose the array size, the code is now prone to a lot of bugs, as you have to keep track of the array size manually. sizeof() returns the size of a pointer type instead of the number of elements in the array. This forces you to manually keep track of the array size using interfaces like process(int *array, int size). This is an ok solution, but prone to errors.
See the guidelines by Bjarne Stroustroup:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rp-run-time
That can be avoided with a better data type, which std::array is designed for, among many other STL classes.
As a side note, unless there's a strong reason to use a fixed size array, std::vector may be a better choice as a contiguous memory data structure.
std::array and C-style arrays are similar:
They both store a contiguous sequence of objects
They are both aggregate types and can therefore be initialized using aggregate initialization
Their size is known at compile time
They do not use dynamic memory allocation
An important advantage of std::array is that it can be passed by value and doesn't implicitly decay to a pointer like a C-style array does.
In both cases, the array is created on the stack.
However, the STL's std::array class template offers some advantages over the "raw" C-like array syntax of your second case:
int array1[arraySize];
For example, with std::array you have a typical STL interface, with methods like size (which you can use to query the array's element count), front, back, at, etc.
You can find more details here.
Is there a fundamental difference between the two? or is it just syntax and the two are basically the same?
There's a number of differences for a raw c-style array (built-in array) vs. the std::array.
As you can see from the reference documentation there's a number of operations available that aren't with a raw array:
E.g.: Element access
at()
front()
back()
data()
The underlying data type of the std::array is still a raw array, but garnished with "syntactic sugar" (if that should be your concern).
The key differences of std::array<> and a C-style array is that the former is a class that wraps around the latter. The class has begin() and end() methods that allow std::array objects to be easily passed in as parameters to STL algorithms that expect iterators (Note that C-style arrays can too via non member std::begin/std::end methods). The first points to the beginning of the array and the second points to one element beyond its end. You see this pattern with other STL containers, such as std::vector, std::map, std::set, etc.
What's also nice about the STL std::array is that it has a size() method that lets you get the element count. To get the element count of a C-style array, you'll have to write sizeof(cArray)/sizeof(cArray[0]), so doesn't stlArray.size() looks much more readable?
You can get full reference here:
http://en.cppreference.com/w/cpp/container/array
Usually you should prefer std::array<T, size> array1; over T array2[size];, althoug the underlying structure is identical.
The main reason for that is that std::array always knows its size. You can call its size() method to get the size. Whereas when you use a C-style array (i.e. what you called "built-in array") you always have to pass the size around to functions that work with that array. If you get that wrong somehow, you could cause buffer overflows and the function tries to read from/write to memory that does not belong to the array anymore. This cannot happen with std::array, because the size is always clear.
IMO,
Pros: It’s efficient, in that it doesn’t use any more memory than built-in fixed arrays.
Cons: std::array over a built-in fixed array is a slightly more awkward syntax, and that you have to explicitly specify the array length (the compiler won’t calculate it for you from the initializer).

for(auto &pointer : vectorOfPointers) vs for(auto pointer : vectorOfPointers)

I was wondering... is there any real difference between:
for(auto &pointer : vectorOfPointers){pointer->fun();}
and
for(auto pointer : vectorOfPointers){pointer->fun();}
where vectorOfPointers is declared as simple vector of normal, old-school pointers:
std::vector<SomeType *> vectorOfPointers;
?
I know that & in for(auto &o : objects) stands for reference, while for(auto o : objects) is the loop on the values. But the "values" in my examples are pointers themselves - I can access the objects to which they point and modify them with both loops.
So, is there any difference? If "not really" (in both the usage and in what the compiler would generate from them), maybe one of those 2 options is an commonly used/approved one?
Lets not add smart pointers to that discussion, I'm rather interested in that precise situation.
So, is there any difference?
In this specific example, no; both loops do the same thing, and should produce (more or less) the same code.
More generally, a non-const reference allows you to modify the vector elements. A copy doesn't, but (for complex types) might be less efficient, and requires the type to be copyable.
maybe one of those 2 options is an commonly used/approved one?
I use the same rule of thumb as for function parameters: by non-const reference only if I want to allow modification; otherwise, by value for simple types or by const reference for complex or non-copyable types.
In the first case you have references to the pointers in your vector. In the second case you have copies of the pointers from your vector. If you were to modify pointer, only in the first case would the pointers inside your vector also be modified.
The fact that your vector contains pointers is really besides the point. This behaviour is the same regardless.

Returning an object or a pointer in C++

In C++, should my method return an object or a pointer to an object? How to decide? What if it's an operator? How can I define?
And one more thing - if the pointer turns to be a vector, how can I find out its size after returned? And if it's impossible, as I think it is, how should I proceed to correctly return an array without this limitation?
In C++, should my method return an object or a pointer to an object?
How to decide?
Since C++11 we have move semantics in C++ which means that it as easy as before and now also fast to return by value. That should be the default.
What if it's an operator? How can I define?
Many operators such as operator= normally return a reference to *this
X& X::operator=(X rhs);
You need to look that up for each operator if you would like to comply with the usual patterns (and you should). Start here: Operator overloading
As pointed out by Ed S. return value optimization also applies (even before C++11) meaning that often object you return need neither be copied or moved.
So, this is now the way to return stuff:
std::string getstring(){
std::string foo("hello");
foo+=" world";
return foo;
}
The fact that I made a foo object here is not my point, even if you did just do return "hello world"; this is the way to go.
And one more thing - if the pointer turns to be a vector, how can I
find out its size after returned? And if it's impossible, as I think
it is, how should I proceed to correctly return an array without this
limitation?
The same goes for all copyable or movable types in the standard (these are almost all types, for example vectors, sets, and what not), except a few exceptions. For example std::arrays do not gain from moving. They take time proportional to the amount of elements. There you could return it in a unique_ptr to avoid the copy.
typedef std::array<int,15> MyArray;
std::unique_ptr<MyArray> getArray(){
std::unique_ptr<MyArray> someArrayObj(new MyArray());
someArrayObj->at(3)=5;
return someArrayObj;
}
int main(){
auto x=getArray();
std::cout << x->at(3) <<std::endl; // or since we know the index is right: (*x)[3]
}
Now, to avoid ever writing new anymore (except for experts in rare cases) you should use a helper function called make_unique. That will vastly help exception safety, and is as convenient:
std::unique_ptr<MyArray> getArray(){
auto someArrayObj=make_unique<MyArray>();
someArrayObj->at(3)=5;
return someArrayObj;
}
For more motivation and the (really short) implementation of make_unique, have a look here:
make_unique and perfect forwarding
Update
Now make_unique is part of the C++14 standard. If you don't have it, you can find and use the whole implementation from the proposal by S.T.L.:
Ideone example on how to do that
In C++, should my method return an object or a pointer to an object?
You should return an object by default. Usual exceptions are functions that return a subclass of a given class, and when returning nothing is a legal option for a function1.
What if it's an operator?
Operators return references or objects; although it is technically possible to return pointers from overloaded operators, it is not usually done.
And one more thing - if the pointer turns to be a vector, how can I find out it's size after returned?
I think you meant an array rather than a vector, because std::vector has a size() member function returning the size of the vector. Finding the size of a variable-length array is indeed not possible.
And if it's impossible, as I think it is, how should I proceed to correctly return an array without this limitation?
You should use std::vector, it does not limit you on the size or the type of elements that go into it.
1 In which case you return NULL or nullptr in C++11.
Unless there is some specific reason to use plain pointers, always return something memory-safe. In an estimated 95% of all cases, simply returning objects is fine, and then return-by-value is definitely the canonical thing to do (simple, efficient, good!).
The remaining 5% are mostly when the returned object is runtime-polymorphic; such an object can't be returned by value in C++ since that would happen on the stack. In such a case, you should return a smart pointer to the new object, in C++11 the standard choice is std::unique_ptr. There is also the case when you want to optionally return something, but that's IMO a case for a specific container, not for pointers, boost::optional or something like that.