C++ struct array member without given dimension (flexible array member?) - c++

I compiled and ran the following C++ code, blindly trying to create a flexible array member like you can in C:
#include <iostream>
template <typename T>
struct Vector {
int length;
T ts[];
};
Vector<int> ts = {
3,
{10, 10, 10},
};
int main() {
std::cout << sizeof(ts) << std::endl;
std::cout << ts.data[1] << std::endl;
return 0;
}
The code compiles and runs just fine, and gives the same output that C would in the same circumstance (it outputs 4 and then 10).
Now, according to this answer from 2010 what I have written should not be valid C++. Furthermore, according to this wikipedia article, "C++ does not have flexible array members".
My question is, which C++ feature am I actually using in the above code, specifically on the line that says "T ts[];"? Does that code actually do what I think it does in general, or is it undefined behavior?

It is one of those things which are different between C and C++. A flexible array member is valid in C but not C++.
That said, many modern compilers compile C as a subset of C++, taking care to care only when you torque up the compiler error diagnostics.
David Tribble spends a moment on it at his Incompatibilities Between ISO C and ISO C++ page, where he specifically addresses this issue:
C++ does not support flexible array members.
(This feature might be provided as an extension by some C++ compilers, but would probably be valid only for POD structure types.)
So yes, this is undefined behavior. The correct way (in both C and C++) to write such a thing is to give it a non-zero dimension:
template <typename T>
struct Vector {
int length;
T ts[1];
};
You have another issue: you must allocate memory for said object. Simply specifying an initializer is not enough. As far as every access to such a thing exists, the compiler only ever thinks it is its minimal size.
This “range hack” is so called because the programmer explicitly uses/abuses C's (and C++'s) ability to violate range bounds to do something tricky.
The consequences of this are many, including inability to store these things in any standard container, or pass them around by value, or do most anything that eschews handling it through a pointer. It has its place, but for the vast majority of use cases C++ has superior options.

Related

C++ code example that makes the compile loop forever

Given that the C++ template system is not context-free and it's also Turing-Complete, can anyone provide me a non-trivial example of a program that makes the g++ compiler loop forever?
For more context, I imagine that if the C++ template system is Turing-complete, it can recognize all recursively enumerable languages and decide over all recursive ones. So, it made me think about the acceptance problem, and its more famous brother, the halting problem. I also imagine that g++ must decide if the input belongs in the C++ language (as it belongs in the decidability problem) in the syntactic analysis. But it also must resolve all templates, and since templates are recursively enumerable, there must be a C++ program that makes the g++ syntactic analysis run forever, since it can't decide if it belongs in the C++ grammar or not.
I would also like to know how g++ deals with such things?
While this is true in theory for the unlimited language, compilers in practice have implementation limits for recursive behavior (e.g. how deep template instantiations can be nested or how many instructions can be evaluated in a constant expression), so that it is probably not straight-forward to find such a case, even if we somehow ignore obvious problems of bounded memory. The standard specifically permits such limits, so if you want to be pedantic I am not even sure that any given implementation has to satisfy these theoretical concepts.
And also infinitely recursive template instantiation specifically is forbidden by the language. A program with such a construct has undefined behavior and the compiler can just refuse to compile if it is detected (although of course it cannot be detected in general).
This shows the limits for clang: Apple clang version 13.1.6 (clang-1316.0.21.2.5)
#include <iostream>
template<int V>
struct Count
{
static constexpr int value = Count<V-1>::value + 1;
};
template<>
struct Count<1>
{
static constexpr int value = 1;
};
int main()
{
#ifdef WORK
int v = Count<1026>::value; // This works.
#else
int v = Count<1027>::value; // This will fail to compile.
#endif
std::cout << "V: " << v << "\n";
}

Aligned allocation of elements in vector

I need to have elements in a std::vector aligned to some given step in memory. For example, in the program as follows:
#include <vector>
#include <iostream>
struct __attribute__((aligned(256))) A
{
};
int main()
{
std::vector<A> as(10);
std::cout << &as[0] << std::endl;
std::cout << &as[1] << std::endl;
}
I would expect that the last two digits in printed numbers will be ‘00’.
In practice, I see that it is true in Visual Studio 2019, and in gcc 8+. But can I be absolutely sure, or is it just a coincidence and some custom allocator in std::vector (like boost::alignment::aligned_allocator) is necessary?
In practice, I see that it is true in Visual Studio 2019, and in gcc 8+. But can I be absolutely sure, or is it just a coincidence and some custom allocator in std::vector (like boost::alignment::aligned_allocator) is necessary?
There is no reason to expect that, provided the absence of bugs in the implementation of the respective compiler (which can however be checked on the assembly level, if required).
Since C++11, there is the alignas-specifier which allows you to enforce the alignment in a standardized way. Consequently, the standard allocator will call operator new upon calling allocator::allocate(), to which it will forward the alignment information according to the documentation. Thus, the standard allocator already respects alignment needs, if specified. However, of course if the global operator new is overloaded by a custom implementation, no such guarantee can be made.

No class template specialization for array of bool?

According to https://en.cppreference.com/, std::vector<bool> has a class template specialization, while std::array<bool, N> does not. Which are the reasons why it is not provided?
When std::vector was introduced, a specialization for bool was considered a good idea. Basically, at that time, the average computer had 4 MB of memory, so saving computer memory was quite important. Nowadays we just say "memory is cheap" (quote from Uncle Bob).
Later it turned out that this specialization creates more problems than it is worth.
The issue is that the address to one of the elements of such a vector is a complex object (it has to store information on which bit holds which value) compared to regular old-fashioned C-array bool a[].
Since compatibility must be retained, this specialization can't be dropped, but based on that lesson, the same approach was not applied to std::array.
Another reason is that std::array is supposed to be a C-array wrapper, so it must be as similar to bool a[N] as possible, and must produce the same machine code when used.
And the last thing, as Cody Gray points out in a comment under question, std::bitset is a constant size array of bits, so such functionality is already available (and can be used if needed).
This is a question about history of evolution of C++. In hindsight a possible explanation is:
std::vector<bool> was a mistake. It is a major annoyance that a std::vector<bool> is very different from std::vector<T>. Generic code that works with vectors often needs a special case for std::vector<bool>. And users often have to apply weird workarounds like using a std::vector<char> in place of std::vector<bool>. Now we cannot go back without breaking lots of existing code. With what we know now, maybe std::vector<bool> would never have made it into C++.
std::array was added only in C++11. There was no reason to make the same mistake again.
The initial motivation to specialize std::vector for bool was to optimize memory usage.
However this was a bad idea as this specialization behaves differently than usual std::vector (see example below).
This error was not reproduced later with C++11's std::array
#include <array>
#include <vector>
int main()
{
std::vector<int> i_v(4);
int i_a = *&i_v[3]; // ok
std::vector<bool> v(4);
bool a = *&v[3]; // Compile-time error
std::array<bool,4> w;
bool b = *&w[3]; // ok
}
The std::vector<bool> specialization was introduced as early as 1994, as per lib.vector.bool of WG21/N0545(1) [emphasis mine]:
23.1.6 Class vector<bool> [lib.vector.bool]
To optimize space allocation, a specialization for bool is provided: [...]
with a motivation to optimize for space allocation, a resource that was sparse back then.
In retrospect, this turned out to be quite a bad idea, and the original motivation was made moot with the rapid growth of available space in computer hardware.
std::array, on the other hand, was introduced much later, in C++11, alongside e.g. auto type deduction, a mechanism which highlighted yet another problem with the std::vector<bool> specialization. Naturally the library spec writers did not repeat the same mistake of std::vector<bool> when designing std::array.
E.g., the following snippet
#include <type_traits>
#include <vector>
int main() {
std::vector<bool> v{false, false, true, true};
auto bool_value = v[1];
static_assert(std::is_same_v<decltype(bool_value), bool>, ""); // Error!
}
fails with the error message that bool_value is not of type bool, but of the cryptic type (implementation-defined)
error: static_assert failed due to requirement
'std::is_same_v<
std::__1::__bit_reference<
std::__1::vector<bool, std::__1::allocator<bool>>, true>,
bool>' ""
(1) Working Paper for Draft Proposed International Standard for Information Systems-- Programming Language C++.

Why are C++ tuples so weird?

I usually create custom structs when grouping values of different types together. This is usually fine, and I personally find the named member access easier to read, but I wanted to create a more general purpose API. Having used tuples extensively in other languages I wanted to return values of type std::tuple but have found them much uglier to use in C++ than in other languages.
What engineering decisions went into making element access use an integer valued template parameter for get as follows?
#include <iostream>
#include <tuple>
using namespace std;
int main()
{
auto t = make_tuple(1.0, "Two", 3);
cout << "(" << get<0>(t) << ", "
<< get<1>(t) << ", "
<< get<2>(t) << ")\n";
}
Instead of something simple like the following?
t.get(0)
or
get(t,0)
What is the advantage? I only see problems in that:
It looks very strange using the template parameter like that. I know that the template language is Turing complete and all that but still...
It makes indexing by runtime generated indices difficult (for example for a small finite ranged index I've seen code using switch statements for each possibility) or impossible if the range is too large.
Edit: I've accepted an answer. Now that I've thought about what needs to be known by the language and when it needs to be known I see it does make sense.
The second you've said:
It makes indexing by runtime generated indices difficult (for example for a small finite ranged index I've seen code using switch statements for each possibility) or impossible if the range is too large.
C++ is a strongly static typed language and has to decide the involved type compile-time
So a function as
template <typename ... Ts>
auto foo (std::tuple<Ts...> const & t, std::size_t index)
{ return get(t, index); }
isn't acceptable because the returned type depends from the run-time value index.
Solution adopted: pass the index value as compile time value, so as template parameter.
As you know, I suppose, it's completely different in case of a std::array: you have a get() (the method at(), or also the operator[]) that receive a run-time index value: in std::array the value type doesn't depends from the index.
The "engineering decisions" for requiring a template argument in std::get<N> are located way deeper than you think. You are looking at the difference between static and dynamic type systems. I recommend reading https://en.wikipedia.org/wiki/Type_system, but here are a few key points:
In static typing, the type of a variable/expression must be known at compile-time. A get(int) method for std::tuple<int, std::string> cannot exist in this circumstance because the argument of get cannot be known at compile-time. On the other hand, since template arguments must be known at compile-time, using them in this context makes perfect sense.
C++ does also have dynamic typing in the form of polymorphic classes. These leverage run-time type information (RTTI), which comes with a performance overhead. The normal use case for std::tuple does not require dynamic typing and thus it doesn't allow for it, but C++ offers other tools for such a case.
For example, while you can't have a std::vector that contains a mix of int and std::string, you can totally have a std::vector<Widget*> where IntWidget contains an int and StringWidget contains a std::string as long as both derive from Widget. Given, say,
struct Widget {
virtual ~Widget();
virtual void print();
};
you can call print on every element of the vector without knowing its exact (dynamic) type.
It looks very strange
This is a weak argument. Looks are a subjective matter.
The function parameter list is simply not an option for a value that is needed at compile time.
It makes indexing by runtime generated indices difficult
Runtime generated indices are difficult regardless, because C++ is a statically typed language with no runtime reflection (or even compile time reflection for that matter). Consider following program:
std::tuple<std::vector<C>, int> tuple;
int index = get_at_runtime();
WHATTYPEISTHIS var = get(tuple, index);
What should be the return type of get(tuple, index)? What type of variable should you initialise? It cannot return a vector, since index might be 1, and it cannot return an integer, since index might be 0. The types of all variables are known at compile time in C++.
Sure, C++17 introduced std::variant, which is a potential option in this case. Tuple was introduced back in C++11, and this was not an option.
If you need runtime indexing of a tuple, you can write your own get function template that takes a tuple and a runtime index and returns a std::variant. But using a variant is not as simple as using the type directly. That is the cost of introducing runtime type into a statically typed language.
Note that in C++17 you can use structured binding to make this much more obvious:
#include <iostream>
#include <tuple>
using namespace std;
int main()
{
auto t = make_tuple(1.0, "Two", 3);
const auto& [one, two, three] = t;
cout << "(" << one << ", "
<< two << ", "
<< three << ")\n";
}

Is it safe to pass pointer to std::pair as a pointer to an array?

Can I safely assume that an address of the first element of a std::pair can be used as an address of two element array? Of course both elements of the pair are of the same type. Following code works in g++ 7.2, clang 3.8 and vc++14
void foo(int* a)
{
std::cout << std::to_string(a[0]) << ", " << std::to_string(a[1]) << std::endl;
}
int main()
{
std::pair<int, int> bar(42, 24);
foo(&bar.first);
return 0;
}
As std::pair is rather simple class I am convinced that this case can be generalized, but I'm not sure to what extent. For example, does it being a template class have any impact on the question?
If I cannot safely do that, why? If it's considered a valid code, what guarantees this?
Making my comment into an answer:
It is categorically not allowed to read beyond object boundaries, except for arrays (as MSalters pointed out). While a single variable can be considered an array of length 1 (so that a[0] is allowed by virtue of its definition as *(a+0)), reading its non-existing "second element" via a[1] is undefined behavior because it reads beyond bar.first's boundaries. That both objects are probably part of a larger aggregate object (whose implementation is unknown) does not change that.
Note that many boundary transgressions like this one work, with known architectures, compilers, libraries, and compiler options; this particular one should work everywhere because int is designed to have the natural word size on a given machine and can thus be aligned without padding in a struct, which a std::pair certainly will be; but there is no guarantee, including malicious compilers. In fact, the program as it is presented (namely as one translation unit) can statically be proven ill-formed. A compiler could detect that and reject compilation.
The cpp reference is not saying anything how the the templated struct std::pair has to be defined (http://en.cppreference.com/w/cpp/utility/pair).
It is not guaranteed, that the first and second parameter will be packed together. Even though most implementations will probably look like
template<class A, class B>
struct pair {
[...]
A first;
B second;
[...]
}
For int your code might work. For other types it is a very unsafe to assume how the internals of std::pair looks like.
If you need an array of integers of the elements, you have to copy them into a separate location.