Is there a reason for zero sized std::array in C++11? - c++

Consider the following piece of code, which is perfectly acceptable by a C++11 compiler:
#include <array>
#include <iostream>
auto main() -> int {
std::array<double, 0> A;
for(auto i : A) std::cout << i << std::endl;
return 0;
}
According to the standard § 23.3.2.8 [Zero sized arrays]:
1 Array shall provide support for the special case N == 0.
2 In the case that N == 0, begin() == end() == unique value. The return value of
data() is unspecified.
3 The effect of calling front() or back() for a zero-sized array is undefined.
4 Member function swap() shall have a noexcept-specification which is equivalent to
noexcept(true).
As displayed above, zero sized std::arrays are perfectly allowable in C++11, in contrast with zero sized arrays (e.g., int A[0];) where they are explicitly forbidden, yet they are allowed by some compilers (e.g., GCC) in the cost of undefined behaviour.
Considering this "contradiction", I have the following questions:
Why the C++ committee decided to allow zero sized std::arrays?
Are there any valuable uses?

If you have a generic function it is bad if that function randomly breaks for special parameters. For example, lets say you could have a template function that takes N random elements form a vector:
template<typename T, size_t N>
std::array<T, N> choose(const std::vector<T> &v) {
...
}
Nothing is gained if this causes undefined behavior or compiler errors if N for some reason turns out to be zero.
For raw arrays a reason behind the restriction is that you don't want types with sizeof T == 0, this leads to strange effects in combination with pointer arithmetic. An array with zero elements would have size zero, if you don't add any special rules for it.
But std::array<> is a class, and classes always have size > 0. So you don't run into those problems with std::array<>, and a consistent interface without an arbitrary restriction of the template parameter is preferable.

One use that I can think of is the return of zero length arrays is possible and has functionality to be checked specifically.
For example see the documentation on the std::array function empty(). It has the following return value:
true if the array size is 0, false otherwise.
http://www.cplusplus.com/reference/array/array/empty/
I think the ability to return and check for 0 length arrays is in line with the standard for other implementations of stl types, for eg. Vectors and maps and is therefore useful.

As with other container classes, it is useful to be able to have an object that represents an array of things, and to have it possible for that array to be or become empty. If that were not possible, then one would need to create another object, or a managing class, to represent that state in a legal way. Having that ability already contained in all container classes, is very helpful. In using it, one then just needs to be in the habit of relating to the array as a container that might be empty, and checking the size or index before referring to a member of it in cases where it might not point to anything.

There are actually quite a few cases where you want to be able to do this. It's present in a lot of other languages too. For example Java actually has Collections.emptyList() which returns a list which is not only size zero but cannot be expanded or resized or modified.
An example usage might be if you had a class representing a bus and a list of passengers within that class. The list might be lazy initialized, only created when passengers board. If someone calls getPassengers() though then an empty list can be returned rather than creating a new list each time just to report empty.
Returning null would also work for the internal efficiency of the class - but would then make life a lot more complicated for everyone using the class since whenever you call getPassengers() you would need to null check the result. Instead if you get an empty list back then so long as your code doesn't make assumptions that the list is not empty you don't need any special code to handle it being null.

Related

In C++ can I treat an array of single-member unions as an array of the element?

Suppose I am writing a fixed-size array class of runtime size, somewhat equivalent to Rust's Box<[T]> in order to save the space of tracking capacity when I know the array isn't going to change size after initialization.
In order to support types which do not have a default constructor, I want to be able to allow the user to supply a generator function that takes the index of the element and produces a T. In order to do this and decouple allocation and initialization, I follow the advice in CJ Johnson's CppCon 2019 talk "How to Hold a T" and initially create the array in terms of a single-member union:
template<typename T>
union MaybeUninit
{
MaybeUninit() {}
~MaybeUninit() {}
T val;
};
// ...
m_array = new MaybeUninit<T>[size];
// initialize all elements by setting val for each item
T* items = reinterpret_cast<T*>(m_array); // is this OK and dereferenceable?
My question is, once the generator is done and all the elements of m_array are initialized, am I allowed (according to the standard, regardless of whether a given compiler implementation permits it) to use reinterpret_cast<T*>(m_array) to treat the result as an array of the actual objects (the line marked "is this OK")? If not, is there any way to get from MaybeUninit<T>* to T* without copying?
In terms of which standard, I'm mainly interested in C++17 or later.
am I allowed (according to the standard, regardless of whether a given compiler implementation permits it) to use reinterpret_cast<T*>(m_array) to treat the result as an array of the actual objects (the line marked "is this OK")?
No, any pointer arithmetic on the resulting pointer will result in UB (for indices >1) or result in a one-past the end pointer that can't be dereferenced (for index 1). Only accessing the element at index 0 this way is allowed (but needs to still be constructed).
The only way you are allowed to perform pointer arithmetic is on pointers to the elements of an array. Your pointer is not pointing to an object that is element of an array (which then for the purpose of pointer arithmetic is considered to be belong to an array of length 1).
If not, is there any way to get from MaybeUninit* to T* without copying?
The pointer conversion is not an issue, but you can't index into the resulting pointer. The only way to avoid this is to have an actual array of T objects.
Note however that you don't need to construct every element in an array of T objects. For example:
std::allocator<T> alloc;
T* ptr = std::allocator_traits<decltype(alloc)>::allocate(size);
Now ptr is a pointer to an array of size objects of type T, but no actual objects of type T in it have their lifetime started. You can construct individual elements into it with placement-new (or std::construct_at or std::allocator_traits<decltype(alloc)>::construct) and destruct them with a destructor call (or std::destroy_at or std::allocator_traits<decltype(alloc)>::destruct). You need to do this with your union approach as well anyhow. This approach also allows you to easily exchange the allocator with a different one.
There will be no overhead for size or capacity management. All of that is now responsibility of the user. Whether this is a good idea is a different question.
Instead of std::allocator or an alternative Allocator implementation you could also use other functions that allocate memory and are specified to implicitly create objects, e.g. operator new or std::malloc, etc.

Why is default construction disallowed in std::span with static extent?

Take a look at this example:
#include <span>
#include <vector>
class Data
{
public:
Data() = default;
template<class R>
explicit Data(R& r)
: buf_(r)
, header_(buf_.first<4>())
{}
private:
std::span<char> buf_;
// compile error
// std::span<char, 4> header_;
// compiles but ill-formed (against the precondition of [span.sub])
std::span<char, 4> header_{buf_.first<4>()};
};
int main()
{
std::vector<char> buf(1234);
Data data{buf};
}
The reason of compile error is because it is explicitly disallowed in the standard [span.cons]:
constexpr span() noexcept;
Constraints: Extent == dynamic_­extent || Extent == 0 is true.
Postconditions: size() == 0 && data() == nullptr.
Why does this constraint exist?
Since default construction for a span with dynamic_extent is already allowed, it feels like these two cases are semantically same:
std::span<char> header_; // allowed
std::span<char, 4> header_; // why disallowed?
Furthermore, take a look at the wording of [span.obs]:
constexpr size_type size() const noexcept;
Effects: Equivalent to: return size_­;
Let's say for example, the standard could even define stricter implementation details to encourage static optimization. If this wording was something like Returns Extent when Extent != dynamic_extent, the size_ data member can be omitted in runtime. If so, I understand that the default construction is invalid, because in that case size() will always return invalid size, i.e. Extent, which is not zero.
However, current standard exposes the non-static member variables in the class definition [span.overview]:
private:
pointer data_; // exposition only
size_type size_; // exposition only
Since we already have those variables, can't the standard library just set data_ = nullptr and size_ = 0 when statically sized span is default constructed? I surely can live with the current wording, but isn't the current standard expecting too strong constraints?
Note that the committee have once attempted to fix [span.cons] already in LWG3198, so I'm pretty sure that they have some rationale for current wording.
A default-constructed std::span<T> successfully refers to 0 objects of type T starting at nullptr. To what 3 objects of type T does std::span<T,3>() refer?
The size_ member always being present is purely a narrative device to simplify the specification; it means nothing and really isn’t there in practice for static-extent spans.
Why does this constraint exist?
Because that's what it means to ask for or specify the size of a span. If you are given a span, any span, the expectation is that span::size will return the number of elements in the array. Period.
If you create a span with X elements in it, you are required to provide a pointer range that actually stores X elements in it. You may of course lie to span, passing it a pointer to fewer elements. But you are breaking your part of the contract; you said that there were X elements in the array, and you provided fewer than X. The UB that results is on you.
It doesn't matter if X is a runtime or compile-time value: the requirement is the same.
To default-construct a statically-sized span is to allow the user to lie by default. A default-initialized span<T, 3> is lying to every user that gets one. It claims to have 3 elements, but it does not.
Good APIs do not let users lie to it by default.
I tend to write this kind of code when no other solution is available:
inline static std::array<char, 4> dummy_header_v;
std::span<char, 4> header_{dummy_header_v};
Personally, this is just a nasty workaround and I really don't like it.
However, there is some benefit for this workaround: the user can expect compiler optimization (such as loop unrolling) whenever one reads from Data::header_, since it is statically sized span std::span<T, N> in the first place. This is useful when you already know the partial buffer size by definition, and when you're going to fetch the real buffer lazily. If the span was declared like std::span<T>, no optimization will ever happen.
If the spec allow default construction, then there's no need of this workaround.
I posted this workaround just for reference; I really appreciate other proper solutions.

Declaring arrays in C++

I am new to C++ and currently learning it with a book by myself. This book seems to say that there are several kinds of arrays depending on how you declare it. I guess the difference between dynamic arrays and static arrays are clear to me. But I do not understand the difference between the STL std::array class and a static array.
An STL std::array variable is declared as:
std::array < int, arraySize > array1;
Whereas a static array variable is declared as:
int array1[arraySize];
Is there a fundamental difference between the two? Or is it just syntax and the two are basically the same?
A std::array<> is just a light wrapper around a C-style array, with some additional nice interface member functions (like begin, end etc) and typedefs, roughly defined as
template<typename T, size_t N>
class array
{
public:
T _arr[N];
T& operator[](size_t);
const T& operator[](size_t) const;
// other member functions and typedefs
}
One fundamental difference though is that the former can be passed by value, whereas for the latter you only pass a pointer to its first element or you can pass it by reference, but you cannot copy it into the function (except via a std::copy or manually).
A common mistake is to assume that every time you pass a C-style array to a function you lose its size due to the array decaying to a pointer. This is not always true. If you pass it by reference, you can recover its size, as there is no decay in this case:
#include <iostream>
template<typename T, size_t N>
void f(T (&arr)[N]) // the type of arr is T(&)[N], not T*
{
std::cout << "I'm an array of size " << N;
}
int main()
{
int arr[10];
f(arr); // outputs its size, there is no decay happening
}
Live on Coliru
The main difference between these two is an important one.
Besides the nice methods the STL gives you, when passing a std::array to a function, there is no decay. Meaning, when you receive the std::array in the function, it is still a std::array, but when you pass an int[] array to a function, it effectively decays to an int* pointer and the size of the array is lost.
This difference is a major one. Once you lose the array size, the code is now prone to a lot of bugs, as you have to keep track of the array size manually. sizeof() returns the size of a pointer type instead of the number of elements in the array. This forces you to manually keep track of the array size using interfaces like process(int *array, int size). This is an ok solution, but prone to errors.
See the guidelines by Bjarne Stroustroup:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rp-run-time
That can be avoided with a better data type, which std::array is designed for, among many other STL classes.
As a side note, unless there's a strong reason to use a fixed size array, std::vector may be a better choice as a contiguous memory data structure.
std::array and C-style arrays are similar:
They both store a contiguous sequence of objects
They are both aggregate types and can therefore be initialized using aggregate initialization
Their size is known at compile time
They do not use dynamic memory allocation
An important advantage of std::array is that it can be passed by value and doesn't implicitly decay to a pointer like a C-style array does.
In both cases, the array is created on the stack.
However, the STL's std::array class template offers some advantages over the "raw" C-like array syntax of your second case:
int array1[arraySize];
For example, with std::array you have a typical STL interface, with methods like size (which you can use to query the array's element count), front, back, at, etc.
You can find more details here.
Is there a fundamental difference between the two? or is it just syntax and the two are basically the same?
There's a number of differences for a raw c-style array (built-in array) vs. the std::array.
As you can see from the reference documentation there's a number of operations available that aren't with a raw array:
E.g.: Element access
at()
front()
back()
data()
The underlying data type of the std::array is still a raw array, but garnished with "syntactic sugar" (if that should be your concern).
The key differences of std::array<> and a C-style array is that the former is a class that wraps around the latter. The class has begin() and end() methods that allow std::array objects to be easily passed in as parameters to STL algorithms that expect iterators (Note that C-style arrays can too via non member std::begin/std::end methods). The first points to the beginning of the array and the second points to one element beyond its end. You see this pattern with other STL containers, such as std::vector, std::map, std::set, etc.
What's also nice about the STL std::array is that it has a size() method that lets you get the element count. To get the element count of a C-style array, you'll have to write sizeof(cArray)/sizeof(cArray[0]), so doesn't stlArray.size() looks much more readable?
You can get full reference here:
http://en.cppreference.com/w/cpp/container/array
Usually you should prefer std::array<T, size> array1; over T array2[size];, althoug the underlying structure is identical.
The main reason for that is that std::array always knows its size. You can call its size() method to get the size. Whereas when you use a C-style array (i.e. what you called "built-in array") you always have to pass the size around to functions that work with that array. If you get that wrong somehow, you could cause buffer overflows and the function tries to read from/write to memory that does not belong to the array anymore. This cannot happen with std::array, because the size is always clear.
IMO,
Pros: It’s efficient, in that it doesn’t use any more memory than built-in fixed arrays.
Cons: std::array over a built-in fixed array is a slightly more awkward syntax, and that you have to explicitly specify the array length (the compiler won’t calculate it for you from the initializer).

Passing reference to deque delete function

I have been given an assignment and I'm struggling to figure out how I'm supposed to implement it.
I've pasted the parts of the assignment that puzzled me below
Write a deque class to hold a list of integers which is implemented internally with a circular array The size of the array can be passed in the constructor, or you can decide on a default value. The class will maintain data members which hold the index position of the head and tail of the list
The class should have member functions:
• bool isEmpty();
• bool isFull();
• bool insertFront(int)
• bool removeFront(int&)
• bool insertBack(int)
• bool removeBack(int&)
prints all items in the array by removing them one at a time from the front.
So I've written all my function and have the deque working, the things I struggled with are:
"The size of the array can be passed in the constructor"
so to accomplish this I declared a pointer called array in my class and then array = new int[size] in my constructor, is this the only way to do this, I'm happy enough it works but not sure if there's a better solution. I was thinking vector, but think that would have been too easy. I also could have declared a const for the size and initialized the array in my class, but again to easy.
The bool removeFront(int&) and bool removeBack(int&) functions really confused me, what reference am I supposed to be passing in? also the return type is bool, but later in the assignment I'm asked to "prints all items in the array by removing them one at a time from the front" how can I do this with a return type of bool, rather than int?
I have changed my functions to remove the reference and have a return type of int to get the code to work, but would like to know how to implement it the way the assignment asks for?
Based on the requirements listed, the intent of the function arguments is unambiguous. Here is why:
Take
bool removeFront(int& );
This not only removes an element at the front of the buffer and stores it in the argument being passed by reference. But, the function returns a "bool" indicating whether it was able to successfully remove or not.
An example usage would be like this:
int elem;
while (removeFront(elem)) {
printf("element : %d ", elem);
}
Here the variable "elem" is passed in by reference. Hence, upon a successful execution of removeFront() you will have elem filled in with the value of the element just removed.
The same reasoning applies to other similar methods. Please go back to using a reference mode parameter as given in the original specification.
The int& argument is not for a count of elements as other answer suggested.
Answer to Part-1:
Your solution is pretty decent. You could also
std::array for storing the elements. There is an advanced trick to do in-place allocation of a variable length array - but, that is beyond the scope of this question.
"The size of the array can be passed in the constructor"
Unless you were told otherwise, use a vector. Using old school arrays is just asking for trouble.
The "bool removeFront(int&)" and "bool removeBack(int&)" functions really confused me, what reference am I supposed to be passing in?
It's a matter of personal preference, but passing in a single int as a reference might be rather unnecessary, what the functions do (if I understood your problem correctly) is remove the element of the array that is at the position of the int you are passing as argument. If said element is correctly removed, you might want to return a true value, otherwise return a false one.
EDIT: Upon re reading the post, what the functions might do is simply remove the 'int' amount of elements from the front or back of the array. Return values should work as previously stated
but later in the assignment I'm asked to "prints all items in the array by removing them one at a time from the front" how can I do this with a return type of bool, rather than int?
The return type of the function has nothing to do with this (unless you were asked to do it recursively). Simply do a loop that starts at the beginning of the array and outputs its content, deletes that same element, then jumps to the next and repeats the process until its out of elements. Again, this is much safer to do with any of the STL containers since you can use iterators.

How to correctly (yet efficiently) implement something like "vector::insert"? (Pointer aliasing)

Consider this hypothetical implementation of vector:
template<class T> // ignore the allocator
struct vector
{
typedef T* iterator;
typedef const T* const_iterator;
template<class It>
void insert(iterator where, It begin, It end)
{
...
}
...
}
Problem
There is a subtle problem we face here:
There is the possibility that begin and end refer to items in the same vector, after where.
For example, if the user says:
vector<int> items;
for (int i = 0; i < 1000; i++)
items.push_back(i);
items.insert(items.begin(), items.end() - 2, items.end() - 1);
If It is not a pointer type, then we're fine.
But we don't know, so we must check that [begin, end) does not refer to a range already inside the vector.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
So the compiler could falsely tell us that the items don't alias, when in fact they do, giving us unnecessary O(n) slowdown.
Potential solution & caveat
One solution is to copy the entire vector every time, to include the new items, and then throw away the old copy.
But that's very slow in scenarios such as in the example above, where we'd be copying 1000 items just to insert 1 item, even though we might clearly already have enough capacity.
Is there a generic way to (correctly) solve this problem efficiently, i.e. without suffering from O(n) slowdown in cases where nothing is aliasing?
You can use the predicates std::less etc, which are guaranteed to give a total order, even when the raw pointer comparisons do not.
From the standard [comparisons]/8:
For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
Wrong. The pointer comparisons are unspecified, not undefined. From C++03 §5.9/2 [expr.rel]:
[...] Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
[...]
-Other pointer comparisons are unspecified.
So it's safe to test if there is an overlap before doing the expensive-but-correct copy.
Interestingly, C99 differs from C++ in this, in that pointer comparisons between unrelated objects is undefined behavior. From C99 §6.5.8/5:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. [...] In all other cases, the behavior is undefined.
Actually, this would be true even if they were regular iterators. There's nothing stopping anyone doing
std::vector<int> v;
// fill v
v.insert(v.end() - 3, v.begin(), v.end());
Determining if they alias is a problem for any implementation of iterators.
However, the thing you're missing is that you're the implementation, you don't have to use portable code. As the implementation, you can do whatever you want. You could say "Well, in my implementation, I follow x86 and < and > are fine to use for any pointers.". And that would be fine.