Equality & assignment operators used on arrays in C++

Equality & assignment operators used on arrays in C++ - c++

I was given a homework question that really confuses me. The question is:
In C++ the equality test == may be
applied to arrays, but the assignment
operator = cannot be applied to
arrays. Explain why.
This confuses me, because my understanding is that the == operator would just compare the addresses of the first two elements (which if two arrays were in fact held in separate memory locations, of course would be different). And the = operator, when used like array1 = array2; would just cause array1 to point to the same memory location as array2 does.
What am I missing here? It seems as though either operator can be used, but neither would produce the results typically intended by those operators.

my understanding is that the == operator would just compare the addresses of the first two elements
This is correct: if you compare two arrays using ==, it will compare the addresses of the arrays, so it will only yield true if you compare an array with itself (or with a pointer to an element of the same type). See the description below for why.
the = operator, when used like array1 = array2; would just cause array1 to point to the same memory location as array2 does.
This is not correct because an array is not a pointer. array1 can't point to the same memory location as array2 because array1 isn't a pointer, it's an array of elements.
An array is a sequence of elements. In most contexts, the name of an array is implicitly converted to a pointer to its initial element. This is why you can do things like:
void f(int*);
int data[10];
int* p = data; // this is the same as 'int* p = &data[0];'
f(data); // this is the same as 'f(&data[0]);'
array1 = array2; won't work because arrays are not assignable (mostly for historical reasons; I've never heard a convincing technical reason why it isn't allowed: it was never allowed in C, and C has been around for decades. There's some discussion of this in the comments and answers to Why does C support memberwise assignment of arrays within structs but not generally?).
The following program will not compile:
int main() {
int a[10], b[10];
a = b;
}
For an "assignable" array, you can use the array container-like class found in Boost (boost::array), C++ TR1 (std::tr1::array), or C++0x (std::array). It is actually a class that contains an array; it can be copied and it provides many of the benefits of the Standard Library containers plus the performance characteristics of an array and the ability to use its data as an array when you need to.

Related

In C++ can I treat an array of single-member unions as an array of the element?

Suppose I am writing a fixed-size array class of runtime size, somewhat equivalent to Rust's Box<[T]> in order to save the space of tracking capacity when I know the array isn't going to change size after initialization.
In order to support types which do not have a default constructor, I want to be able to allow the user to supply a generator function that takes the index of the element and produces a T. In order to do this and decouple allocation and initialization, I follow the advice in CJ Johnson's CppCon 2019 talk "How to Hold a T" and initially create the array in terms of a single-member union:
template<typename T>
union MaybeUninit
{
MaybeUninit() {}
~MaybeUninit() {}
T val;
};
// ...
m_array = new MaybeUninit<T>[size];
// initialize all elements by setting val for each item
T* items = reinterpret_cast<T*>(m_array); // is this OK and dereferenceable?
My question is, once the generator is done and all the elements of m_array are initialized, am I allowed (according to the standard, regardless of whether a given compiler implementation permits it) to use reinterpret_cast<T*>(m_array) to treat the result as an array of the actual objects (the line marked "is this OK")? If not, is there any way to get from MaybeUninit<T>* to T* without copying?
In terms of which standard, I'm mainly interested in C++17 or later.

am I allowed (according to the standard, regardless of whether a given compiler implementation permits it) to use reinterpret_cast<T*>(m_array) to treat the result as an array of the actual objects (the line marked "is this OK")?
No, any pointer arithmetic on the resulting pointer will result in UB (for indices >1) or result in a one-past the end pointer that can't be dereferenced (for index 1). Only accessing the element at index 0 this way is allowed (but needs to still be constructed).
The only way you are allowed to perform pointer arithmetic is on pointers to the elements of an array. Your pointer is not pointing to an object that is element of an array (which then for the purpose of pointer arithmetic is considered to be belong to an array of length 1).
If not, is there any way to get from MaybeUninit* to T* without copying?
The pointer conversion is not an issue, but you can't index into the resulting pointer. The only way to avoid this is to have an actual array of T objects.
Note however that you don't need to construct every element in an array of T objects. For example:
std::allocator<T> alloc;
T* ptr = std::allocator_traits<decltype(alloc)>::allocate(size);
Now ptr is a pointer to an array of size objects of type T, but no actual objects of type T in it have their lifetime started. You can construct individual elements into it with placement-new (or std::construct_at or std::allocator_traits<decltype(alloc)>::construct) and destruct them with a destructor call (or std::destroy_at or std::allocator_traits<decltype(alloc)>::destruct). You need to do this with your union approach as well anyhow. This approach also allows you to easily exchange the allocator with a different one.
There will be no overhead for size or capacity management. All of that is now responsibility of the user. Whether this is a good idea is a different question.
Instead of std::allocator or an alternative Allocator implementation you could also use other functions that allocate memory and are specified to implicitly create objects, e.g. operator new or std::malloc, etc.

Why doesn't C++ support range based for loop for dynamic arrays?

Why doesn't C++ support range based for loop over dynamic arrays? That is, something like this:
int* array = new int[len];
for[] (int i : array) {};
I just invented the for[] statement to rhyme with new[] and delete[]. As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work) so in theory, range based for loop could also be made to work. What is the reason that it's not made to work?

What is the reason that it's not made to work?
A range based loop like
for(auto a : y) {
// ...
}
is just syntactic sugar for the following expression
auto endit = std::end(y);
for(auto it = std::begin(y); it != endit; ++it) {
auto a = *it;
// ...
}
Since std::begin() and std::end() cannot be used with a plain pointer, this can't be applied with a pointer allocated with new[].
As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work)
How delete[] keeps track of the memory block that was allocated with new[] (which isn't necessarily the same size as was specified by the user), is a completely different thing and the compiler most probably doesn't even know how exactly this is implemented.

When you have this:
int* array = new int[len];
The problem here is that your variable called array is not an array at all. It is a pointer. That means it only contains the address of one object (in this case the first element of the array created using new).
For range based for to work the compiler needs two addresses, the beginning and the end of the array.
So the problem is the compiler does not have enough information to do this:
// array is only a pointer and does not have enough information
for(int i : array)
{
}

int* array = new int[len];
for[] (int i : array) {}
There are several points which must be addressed; I'll tackle them one at a time.
Does the run-time knows the size of the array?
In certain conditions, it must. As you pointed out, a call to delete[] will call the destructor of each element (in reserve order) and therefore must know how many there are.
However, by not specifying that the number of elements must be known, and accessible, the C++ standard allows an implementation to omit it whenever the call to the destructor is not required (std::is_trivially_destructible<T>::value evaluates to true).
Can the run-time distinguish between pointer and array?
In general, no.
When you have a pointer, it could point to anything:
a single item, or an item in an array,
the first item in an array, or any other,
an array on the stack, or an array on the heap,
just an array, or an array part of a larger object.
This is the reason what delete[] exists, and using delete here would be incorrect. With delete[], you the user state: this pointer points to the first item of a heap-allocated array.
The implementation can then assume that, for example, in the 8 bytes preceding this first item it can find the size of the array. Without you guaranteeing this, those 8 bytes could be anything.
Then, why not go all the way and create for[] (int i : array)?
There are two reasons:
As mentioned, today an implementation can elide the size on a number of elements; with this new for[] syntax, it would no longer be possible on a per-type basis.
It's not worth it.
Let us be honest, new[] and delete[] are relics of an older time. They are incredibly awkward:
the number of elements has to be known in advance, and cannot be changed,
the elements must be default constructible, or otherwise C-ish,
and unsafe to use:
the number of elements is inaccessible to the user.
There is generally no reason to use new[] and delete[] in modern C++. Most of the times a std::vector should be preferred; in the few instances where the capacity is superfluous, a std::dynarray is still better (because it keeps track of the size).
Therefore, without a valid reason to keep using these statements, there is no motivation to include new semantic constructs specifically dedicated to handling them.
And should anyone be motivated enough to make such a proposal:
the inhibition of the current optimization, a violation of C++ philosophy of "You don't pay for what you don't use", would likely be held against them,
the inclusion of new syntax, when modern C++ proposals have gone to great lengths to avoid it as much as possible (to the point of having a library defined std::variant), would also likely be held against them.
I recommend that you simply use std::vector.

This is not related to dynamic arrays, it is more general. Of course for dynamic arrays there exists somewhere the size to be able to call destructors (but remember that standard doesn't says anything about that, just that calling delete [] works as intended).
The problem is with pointers in general as given a pointer you can't tell if it correspond to any kind of...what?
Arrays decay to pointers but given a pointer what can you say?

array is not an array, but a pointer and there's no information about the size of the "array". So, compiler can not deduce begin and end of this array.
See the syntax of range based for loop:
{
auto && __range = range_expression ;
for (auto __begin = begin_expr, __end = end_expr;
__begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement
}
}
range_expression - any expression that represents a suitable sequence
(either an array or an object for which begin and end member functions
or free functions are defined, see below) or a braced-init-list.
auto works at compile time.So, begin_expr and end_expr doesn't at deduct runtime.

The reason is that, given only the value of the pointer array, the compiler (and your code) has no information about what it points at. The only thing known is that array has a value which is the address of a single int.
It could point at the first element of a statically allocated array. It could point at an element in the middle of a dynamically allocated array. It could point at a member of a data structure. It could point at an element of an array that is within a data structure. The list goes on.
Your code will make ASSUMPTIONS about what the pointer points at. It may assume it is an array of 50 elements. Your code may access the value of len, and assume array points at the (first element of) an array of len elements. If your code gets it right, all works as intended. If your code gets it wrong (e.g. accessing the 50th element of an array with 5 elements) then the behaviour is simply undefined. It is undefined because the possibilities are endless - the book-keeping to keep track of what an arbitrary pointer ACTUALLY points at (beyond the information that there is an int at that address) would be enormous.
You're starting with the ASSUMPTION that array points at the result from new int[len]. But that information is not stored in the value of array itself, so the compiler has no way to work back to a value of len. That would be needed for your "range based" approach to work.
While, yes, given array = new int[len], the machinery invoked by delete [] array will work out that array has len elements, and release them. But delete [] array also has undefined behaviour if array results from something other than a new [] expression. Even
int *array = new int;
delete [] array;
gives undefined behaviour. The "runtime" is not required to work out, in this case, that array is actually the address of a single dynamically allocated int (and not an actual array). So it is not required to cope with that.

Declaring arrays in C++

I am new to C++ and currently learning it with a book by myself. This book seems to say that there are several kinds of arrays depending on how you declare it. I guess the difference between dynamic arrays and static arrays are clear to me. But I do not understand the difference between the STL std::array class and a static array.
An STL std::array variable is declared as:
std::array < int, arraySize > array1;
Whereas a static array variable is declared as:
int array1[arraySize];
Is there a fundamental difference between the two? Or is it just syntax and the two are basically the same?

A std::array<> is just a light wrapper around a C-style array, with some additional nice interface member functions (like begin, end etc) and typedefs, roughly defined as
template<typename T, size_t N>
class array
{
public:
T _arr[N];
T& operator[](size_t);
const T& operator[](size_t) const;
// other member functions and typedefs
}
One fundamental difference though is that the former can be passed by value, whereas for the latter you only pass a pointer to its first element or you can pass it by reference, but you cannot copy it into the function (except via a std::copy or manually).
A common mistake is to assume that every time you pass a C-style array to a function you lose its size due to the array decaying to a pointer. This is not always true. If you pass it by reference, you can recover its size, as there is no decay in this case:
#include <iostream>
template<typename T, size_t N>
void f(T (&arr)[N]) // the type of arr is T(&)[N], not T*
{
std::cout << "I'm an array of size " << N;
}
int main()
{
int arr[10];
f(arr); // outputs its size, there is no decay happening
}
Live on Coliru

The main difference between these two is an important one.
Besides the nice methods the STL gives you, when passing a std::array to a function, there is no decay. Meaning, when you receive the std::array in the function, it is still a std::array, but when you pass an int[] array to a function, it effectively decays to an int* pointer and the size of the array is lost.
This difference is a major one. Once you lose the array size, the code is now prone to a lot of bugs, as you have to keep track of the array size manually. sizeof() returns the size of a pointer type instead of the number of elements in the array. This forces you to manually keep track of the array size using interfaces like process(int *array, int size). This is an ok solution, but prone to errors.
See the guidelines by Bjarne Stroustroup:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rp-run-time
That can be avoided with a better data type, which std::array is designed for, among many other STL classes.
As a side note, unless there's a strong reason to use a fixed size array, std::vector may be a better choice as a contiguous memory data structure.

std::array and C-style arrays are similar:
They both store a contiguous sequence of objects
They are both aggregate types and can therefore be initialized using aggregate initialization
Their size is known at compile time
They do not use dynamic memory allocation
An important advantage of std::array is that it can be passed by value and doesn't implicitly decay to a pointer like a C-style array does.

In both cases, the array is created on the stack.
However, the STL's std::array class template offers some advantages over the "raw" C-like array syntax of your second case:
int array1[arraySize];
For example, with std::array you have a typical STL interface, with methods like size (which you can use to query the array's element count), front, back, at, etc.
You can find more details here.

Is there a fundamental difference between the two? or is it just syntax and the two are basically the same?
There's a number of differences for a raw c-style array (built-in array) vs. the std::array.
As you can see from the reference documentation there's a number of operations available that aren't with a raw array:
E.g.: Element access
at()
front()
back()
data()
The underlying data type of the std::array is still a raw array, but garnished with "syntactic sugar" (if that should be your concern).

The key differences of std::array<> and a C-style array is that the former is a class that wraps around the latter. The class has begin() and end() methods that allow std::array objects to be easily passed in as parameters to STL algorithms that expect iterators (Note that C-style arrays can too via non member std::begin/std::end methods). The first points to the beginning of the array and the second points to one element beyond its end. You see this pattern with other STL containers, such as std::vector, std::map, std::set, etc.
What's also nice about the STL std::array is that it has a size() method that lets you get the element count. To get the element count of a C-style array, you'll have to write sizeof(cArray)/sizeof(cArray[0]), so doesn't stlArray.size() looks much more readable?
You can get full reference here:
http://en.cppreference.com/w/cpp/container/array

Usually you should prefer std::array<T, size> array1; over T array2[size];, althoug the underlying structure is identical.
The main reason for that is that std::array always knows its size. You can call its size() method to get the size. Whereas when you use a C-style array (i.e. what you called "built-in array") you always have to pass the size around to functions that work with that array. If you get that wrong somehow, you could cause buffer overflows and the function tries to read from/write to memory that does not belong to the array anymore. This cannot happen with std::array, because the size is always clear.

IMO,
Pros: It’s efficient, in that it doesn’t use any more memory than built-in fixed arrays.
Cons: std::array over a built-in fixed array is a slightly more awkward syntax, and that you have to explicitly specify the array length (the compiler won’t calculate it for you from the initializer).

Char array copy fails

I'm copying an array, and for some reason the values aren't the same after the copy. The code is below. In both cases, the _data variable is a char[4]. After the copy, the assert fires. If I examine the two values in the debugger, they show as: 0x00000000015700a8 and 0x00000000015700b0.
_data[0] = rhsG->_data[0];
_data[1] = rhsG->_data[1];
_data[2] = rhsG->_data[2];
_data[3] = rhsG->_data[3];
assert(_data == rhsG->_data);

You've made the mistake of thinking C++ is an easy-to-use high-level language (joke). operator == on C-style arrays compares their address, which of course is different here. You can use std::equal to compare the two arrays, or use a different data structure which supports a more intuitive opeartor ==, such as std::array or std::vector.
You could then also use their operator = to copy them, instead of each element one at a time, assuming the source and destination are the same size. There is std::copy if they are not, or they must be C-style arrays.

If comparing with == you are just comparing two pointers, which value are different. If you want to compare for equality two arrays, you can use memcmp()
assert( ! memcmp(_data, rhsG->_data, 4) );

When you use operator == in assert "_data == rhsG->_data", _data and rhsG->_data are both represented address of the array. So, in your debugger, 0x00000000015700a8 is array address of _data and 0x00000000015700b0 is array address of rhsG->_data. Obviously, they are different, then the assert fires.
After all, array name is always a pointer that point to the first array address in memory.

"_data == rhsG->_data" does not compare the individual elements of two arrays.
The "==" operator is not defined for arrays, so the two parameters are decayed to pointers, which == can work on.

Your assert is comparing the addresses of the two arrays which are different because they are in different memory locations.
If you really want to compare the values, then either loop over them or use memmp.
assert(memcmp(_data, rhsG->_data, 4) == 0);

Pass vectors by pointer and reference in C++

A quick question about how to safely pass and use vectors in c++.
I know that when using vectors you have to be very careful with addresses to them and their elements because when you dynamically change their size they may change their address (unless you use reserve etc. but I'm imagining I will not know how much space I will need).
Now I want to pass an existing vector (created elsewhere) to a function which adapts it and changes it size etc. but I'm a little unclear as to what is safe to do because I would normally achieve all of this with pointers. On top of this there is using references to the vector and this just muddies the water for me.
For instance take the two following functions and comments in them
void function1(std::vector<int>* vec){
std::cout<<"the size of the vector is: "<<vec->size()<<std::endl; //presumably valid here
for (int i=0;i<10;i++){
(*vec).pushback(i); //Is this safe? Or will this fail?
// Or: vec->pushback(i); Any difference?
}
std::cout<<"the size of the vector is: "<<vec->size()<<std::endl; //Is this line valid here??
}
AND
void function2(std::vector<int>& vec){
std::cout<<"the size of the vector is: "<<vec.size()<<std::endl; //presumably valid here
for (int i=0;i<10;i++){
vec.pushback(i); //Is this safe? Or will this fail?
}
std::cout<<"the size of the vector is: "<<vec.size()<<std::endl; //Is this line valid here??
}
Is there any difference between the two functions, both in terms of functionality and in terms of safety?
Or in other words, if I only have a pointer/reference to a vector and need to resize it how can I be sure where the vector will actually be in memory, or what the pointer to the vector really is, after I operate on it. Thanks.

In term of functionality, in the very limited context you gave us, they are essentially the same.
In more general view, if you want to write generic code, consider that operation and operators bind directly to reference, but not to pointers
a = b + c;
To compile requires
A operator+(const B&, const C&);
But
A* operator+(const B*, const C*);
is all a different beast.
Also, an expression taking reference and taking value have the same syntax, but an expression taking pointers require pointers to be deference to provide equal semantics, but this leads to different expression syntax ( *a + *b against a+b ) thus leading to "less general code".
On the counterpart, if you are writing a class that have runtime polymorphism (and lyskov substitution in mind), you will most likely treat dynamically allocated object, and hence, manipulating them through pointers may be more natural.
There are "grey areas" where the two things mesh, but -in general- pointer taking function are more frequent in runtime based OOP frameworks, while reference taking functions are more frequent in "value based generic algorithms", where static type deduction is expected, and on-stack based allocation is most likely wanted.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js