C++ alignment of multidimensional array structure - c++

In my code, I have to consider an array of arrays, where the inner arrays are of a fixed dimension. In order to make use of STL algorithms, it is useful to actually store the data as array of arrays, but I also need to pass that data to a C library, which takes a flattened C-style array.
It would be great to be able to convert (i.e. flatten) the multi-dimensional array cheaply and in a portable way. I will stick to a very simple case, the real problem is more general.
struct my_inner_array { int data[3]; };
std::vector<my_inner_array> x(15);
Is
&(x[0].data[0])
a pointer to a continuous block of memory of size 45*sizeof(int) containing the same entries as x? Or do I have to worry about alignment? I am afraid that this will work for me (at least for certain data types and inner array sizes) but that it is not portable.
Is this code portable?
If not, is there a way to make it work?
If not, do you have any suggestions what I could do?
Does it change anything at all if my_inner_array is not a POD struct, but contains some methods (as long as the class does not contain any virtual methods)?

1 Theoretically no. The compiler may decide to add padding to my_inner_array. In practice, I don't see a reason why the compiler would add padding to a struct that has an array in it. In such a case there's no alignment problem creating an array of such structs. You can use a compile time assert:
typedef int my_inner_array_array[3];
BOOST_STATIC_ASSERT(sizeof(my_inner_array) == sizeof(my_inner_array_array));
4 If there are no virtual methods it shouldn't make any difference.

Related

Array of class holding an array memory layout

If we have a class which holds an array, let's call it vector and hold the values in a simple array called data:
class vector
{
public:
double data[3];
<...etc..>
};
Note: called as vector is for clearer explanation, it is not std::vector!!!
So my question is that, if I store only typedefs near this array inside the class and some constrexpr, am I correct if the class will be only 3 doubles after each other inside the memory?
And then if i create an array of vectors like:
vector vl[3];
Note: size of the array is not always known at compile time, not use 3 for the example.
then in the memory it'll be just 9 doubles after each other, right?
so vl[0].data[3] will always return the 2nd vectors 1st element? And in this case is it guaranteed that the result will be always like a simple array in the memory?
I found only cases with array of arrays, but not with array of classes holding an array, and I'm not sure if it is exactly the same at the end. I made some tests and it seems like it is working as I expected, but I don't know if it is always true..
Thank you!
Mostly, yes.
The standard doesn't promise that there never is anything after data in the representation of a vector, but all the implementations that I know of won't add any padding in this case.
What is promised is that there is no padding before data in the representation of vector, because it is a StandardLayout type.
You are right with your first example: The class layout is like a C struct. The first member resides at the address of the struct itself, and if it is an array, all the array's members are adjacent.
Between struct members, however, may be padding; so there is no guarantee that the size of a struct is the sum of all member sizes. I'd have to dig into the standard but I assume this includes padding at the end. This answer affirms that; assert(sizeof(vector) == 3*sizeof(double)) may not hold. In reality I'd assume that an implementation may pad a struct containing three chars so that the struct aligns at word boundaries in an array, but not three doubles which are typically the type with the strongest alignment requirements. But there is no guarantee between implementations, architectures and compiler options: Imagine we switch to 128 bit CPUs.
With respect to your second example: The above applies recursively, so the standard gives no guarantee that the 9 doubles will be adjacent. On the other hand, I bet they will be, and the program can assert it with a simple compile-time static_assert.

Best array type for a data member of an object for C++?

I've recently started learning C++ (having already a lot of experience with C).
I've briefly looked at vector<..> and array<..>.
I was wondering what is the best array type for a data member of an object for C++. Please keep in mind I want encapsulation, so this data member will be private - so I will need getter and setter functions for it.
I know the length of the array (the length will be kept constant, so no reallocation will be needed).
Would the traditional int array[100]; be the best?
Thanks in advance! :)
When you know at compile time the length of the array you should probably go with array. You could go for vector too, but that might make somebody think that the size could potentially change (or are at least not determined at compile time). If you're using large arrays and the variable lives in local scope you should consider using vector anyway.
Using int array[100]; could also be an alternative, it has some advantages and some disadvantage.
The advantage is that it might be slightly faster to set up (it would probably be faster than vector anyway) and you can initialize it in the classical way. Another is that some implementation will allow for classic array with variable length decided on instantiation (I don't think it has made it into the standard, but it's rather easy to support), if of course you accept to rely on the implementation supporting this extension.
The disadvantage is that you don't get easy full access to the STL methods (you still have the possibility via std::begin and std::end to get an iterator for the array), but also that if created as local variable you're bound to use stack space for storing the objects as opposed to vector which would need to dynamically allocate space for the storage (array can potentially use stack space).
Since you know C, I'll give you an analogy in terms of that language.
std::vector is used like int* array = malloc(sizeof int * size) is used in C. If the array is big and you don't want the owning object to be big, then use std::vector. This is important if you want your object to be efficiently movable or swappable. If you consider std::vector, don't forget to evaluate std::deque as well.
A manually allocated dynamic array has no advantages over std::vector.
std::array is used like int array[100] array is used in C. The lack of separate dynamic allocation makes creation of std::array fast. If you have many objects that contain small arrays, then std::array might be a good choice. If the size of the array is not constant or not known at compile time, then you cannot use std::array. In that case, use std::vector instead.
A regular C-style array does have one small advantage over std::array. Which is that when you initialize it with curly brackets, you may omit the size. With std::array, you must specify the size even if it seems redundant. This slightly nicer syntax does not outweigh the advantages of std::array, though. One significant advantage of std::array is that unlike a regular C-style array, it can be passed as a parameter and returned by value.
So, in conclusion, the bestness of an array depends on your needs. In some case, std::array is better and in others std::vector is. In some cases, std::array is not an option at all. There's no need for the C-style alternatives.

Zero a 2d array in C++. Do I need 2 for loops?

How can I zero a 2D array in C++? Do I need two for loops just for that?
Coming from other higher languages, I wonder why C++ doesn't initialize arrays to meaningful/sensible defaults? Do I always need to declare an array then "zero" it out right afterwards?
C++ language tries to follow the principle of "you don't pay for what you don't use". It doesn't initialize fundamental types to any default values because you might not want it to happen. In any case, the language provides you the opportunity to explicitly request such initialization.
At the moment of declaration you can use initializers and simply do this
int array[10][20] = {};
or, for a dynamically allocated array
int (*array)[20] = new int[10][20]();
and this will give a zero-initialized array. No loops necessary.
But if you want to zero-out an existing array, then you will indeed have to use something more elaborate. In case of integer types a good old memset will work. For pointer or floating-point types the situation is in general case more complicated. memset might work or it might not work, depending on the implementation-defined properties. In any case, the standard library can help you to reduce the number of explicit loops by providing such loop wrappers as std::fill.
Depends how you create it.
Two-dimensional vector, yes, two for-loops (because integers are primitive types - classes will call the default ctor).
Two-dimensional array? No, you can memset or bzero at once as the memory is all contiguous, whether using malloc or new.
You can simply use memset to put zeros everywhere. Documented here
memset(pointer, value_to_put, num_bytes);
So in this case, you would have something along the lines of
memset(myArray, 0, sizeof(arrayElement) * width * height);
Only C-style arrays of scalar types have the option to be created without initialization. To initialize such an array to zero, you can just provide an empty initializer:
int a[3][3] = {}; // 3x3 zeroes
As you're coming from 'other higher languages', consider less low-level types for your 2D data (there are multiple matrix libraries: boost.ublas and Eigen probably the most popular, and of course there are multi-arrays in boost too)

How to use a std::vector in a C function

A C function expects an array of buffers to be in scope at runtime. e.g.
char values[x][y]
The C function will populate the buffers
I would like to use a dynamic array so I don't have to hard code the dimensions
How do I use a std::vector in this situation?
Just to be clear, I am using C++. The C function is contained in a library that I cannot modify.
If you just want to pass the dynamic array encapsulated in a std::vector to a c routine you can pass a pointer to the head of the underlying array as:
std::vector<char> myvector;
// size-up myvector as needed
foo(&myvector[0]); // pass a pointer to start of myvector to function foo
The c++ standard ensures that the underlying array in std::vector is always contiguous.
Hope this helps.
EDIT: While the declaration char values[x][y] creates an "array of arrays" the memory for values will actually just be a contiguous block, essentially char linear_values[x * y].
If you size your std::vector to include a count of x * y elements it should give you the same underlying dynamically allocated array space.
The c function will access the array in row-major order, so the first row of elements will come first, followed by the second full row etc...
C doesn't have standard data structures libraries.
If you really want all the functionality of a vector, and it's not for something critical, you can probably find someone's pet implementation of a straight C vector online and just use that.
If it is critical, write your own. It's not too hard, and can be quite useful.
If you just want a dynamically growing array, it's easy to emulate that behavior of a vector using the realloc function, which extends the dimensions of a heap-allocated array. Start with a small array, and grow as needed when you reach the end. It's more efficient to grow in big chunks, but if you have some idea of what your data looks like you could grow it in a different way. A common method is doubling the array size every time you run out.
You can get the details of realloc at:
http://www.cplusplus.com/reference/clibrary/cstdlib/realloc/
or, on a *nix system:
man realloc
You can't.
By definition, C knows nothing of any of the required components of a std::vector, including, but not limited to:
C does not have namespaces, so it can't understand the std namespace.
C does not have templates, so it can't understand the std::vector<T> type.
Essentially, you need what looks like a C function, but that is, for all intents and purposes, a C++ function.
The simplest way to achieve this is probably to write what looks like a C function, using C++, and running the whole mess through a C++ compiler rather than a C compiler.

Type aliasing and dynamically allocated arrays

I'm trying to facilitate automatic vectorization by the compiler in the blitz++ array library. For this reason, I'd like to present a view of the array data that is in chunks of fixed-length vectors, which are already vectorized well. However, I can't figure out what the type aliasing rules imply in conjunction with dynamically allocated arrays.
Here's the idea. An array currently consists of
T_numtype* restrict data_;
Operations are done by looping over these data. What I would like to do is present an alternative view of this array as an array of TinyVector<T_numtype, N>, which is a fixed-length vector whose operations are totally vectorized using the expression template machinery. The idea would be that a L-length array should be either T_numtype[L] or TinyVector<T_numtype, N>[L/N]. Is there a way to accomplish this without running afoul of the type alasing rules?
For a statically allocated array, one would do
union {
T_numtype data_[L];
TinyVector<T_numtype, N>[L/N];
};
The closest I could think of is to define
typedef union {
T_numtype data_[N];
TinyVector<T_numtype, N>;
} u;
u* data_;
and then allocate it with
data_ = new u[L/N];
But it seems that now I have given up my right to address the entire array as a flat array of T_numtype, so to access a particular element I would need to do data_[i/N].data_[i%N], which is a lot more complicated.
So, is there a way to legally create a union of T_numtype data_[L] and TinyVector<T_numtype, N>[L/N] where L is a dynamically determined size?
(I'm aware that there are additional alignment concerns, i.e. N must be a value that is the same as the alignment of the TinyVector member, otherwise there will be holes in the array.)
Aliasing is hard to make legal. However, if some "operations are done by looping over these data.", do those operations require that these data are exactly an array of T_numtype?
It may be better to wrap the data in a class with one data member of type TinyVector<T_numtype, N>[L/N] or even std::vector<TinyVector<T_numtype, N> > since that L is apparently determined at runtime, and expose a pair of iterators for those operations that want to loop over the entire data as a single sequence.