Only one array without a size allowed per struct? - c++

I was writing a struct to describe a constant value I needed, and noticed something strange.
namespace res{
namespace font{
struct Structure{
struct Glyph{
int x, y, width, height, easement, advance;
};
int glyphCount;
unsigned char asciiMap[]; // <-- always generates an error
Glyph glyphData[]; // <-- never generates an error
};
const Structure system = {95,
{
// mapping data
},
{
// glyph spacing data
}
}; // system constructor
} // namespace font
} // namespace res
The last two members of Structure, the unsized arrays, do not stop the compiler if they are by themselves. But if they are both included in the struct's definition, it causes an error, saying the "type is incomplete"
This stops being a problem if I give the first array a size. Which isn't a problem in this case, but I'm still curious...
My question is, why can I have one unsized array in my struct, but two cause a problem?

In standard C++, you can't do this at all, although some compilers support it as an extension.
In C, every member of a struct needs to have a fixed position within the struct. This means that the last member can have an unknown size; but nothing can come after it, so there is no way to have more than one member of unknown size.
If you do take advantage of your compilers non-standard support for this hack in C++, then beware that things may go horribly wrong if any member of the struct is non-trivial. An object can only be "created" with a non-empty array at the end by allocating a block of raw memory and reinterpreting it as this type; if you do that, no constructors or destructors will be called.

You are using a non-standard microsoft extension. C11 (note: C, not C++) allows the last array in a structure to be unsized (read: a maximum of one arrays):
A Microsoft extension allows the last member of a C or C++ structure or class to be a variable-sized array. These are called unsized arrays. The unsized array at the end of the structure allows you to append a variable-sized string or other array, thus avoiding the run-time execution cost of a pointer dereference.
// unsized_arrays_in_structures1.cpp
// compile with: /c
struct PERSON {
unsigned number;
char name[]; // Unsized array
};
If you apply the sizeof operator to this structure, the ending array size is considered to be 0. The size of this structure is 2 bytes, which is the size of the unsigned member. To get the true size of a variable of type PERSON, you would need to obtain the array size separately.
The size of the structure is added to the size of the array to get the total size to be allocated. After allocation, the array is copied to the array member of the structure, as shown below:

The compiler needs to be able to decide on the offset of every member within the struct. That's why you're not allowed to place any further members after an unsized array. It follows from this that you can't have two unsized arrays in a struct.

It is an extension from Microsoft, and sizeof(structure) == sizeof(structure_without_variable_size_array).
I guess they use the initializer to find the size of the array. If you have two variable size arrays, you can't find it (equivalent to find one unique solution of a 2-unknown system with only 1 equation...)

Arrays without a dimension are not allowed in a struct,
period, at least in C++. In C, the last member (and only the
last) may be declared without a dimension, and some compilers
allow this in C++, as an extension, but you shouldn't count on
it (and in strict mode, they should at least complain about it).
Other compilers have implemented the same semantics if the last
element had a dimension of 0 (also an extension, requiring
a diagnostic in strict mode).
The reason for limiting incomplete array types to the last
element is simple: what would be the offset of any following
elements? Even when it is the last element, there are
restrictions to the use of the resulting struct: it cannot be
a member of another struct or an array, for example, and
sizeof ignores this last element.

Related

Alternative to zero-sized array in embedded C++

The Multiboot Specification has structures like this:
struct multiboot_tag_mmap
{
multiboot_uint32_t type;
multiboot_uint32_t size;
multiboot_uint32_t entry_size;
multiboot_uint32_t entry_version;
struct multiboot_mmap_entry entries[0];
};
The intent seems to be that the array size can vary. The information is not known until passed along by the boot loader. In hosted C++, the suggested advice is to "use vector". Well, I can't do that. The alternative is to use dynamic allocation, but that would require implementing a significant chunk of the kernel (paging, MMU, etc.) before I even have the memory map information. A bit of a chicken or egg problem.
The "hack" is to just enable extensions with gnu++11. But I try to avoid using extensions as much as possible to avoid C-ish code or code that could potentially lead to undefined behavior. The more portable the code is, the less chance of bugs in my opinion.
Finally, you iterate over the memory map like this:
for (mmap = ((struct multiboot_tag_mmap *) tag)->entries;
(multiboot_uint8_t *) mmap
< (multiboot_uint8_t *) tag + tag->size;
mmap = (multiboot_memory_map_t *)
((unsigned long) mmap
+ ((struct multiboot_tag_mmap *) tag)->entry_size))
So the size of the structure is tag->size.
I can modify the multiboot header so long as the semantics are the same. The point is how it looks to the bootloader. What can I do?
Instead of 0-sized array, you can use 1-sized array:
struct multiboot_tag_mmap
{
...
struct multiboot_mmap_entry entries[1];
};
This will change only result of sizeof(struct multiboot_tag_mmap), but it shouldn't be used in any case: size of allocated structure should be computed as
offsetof(struct multiboot_tag_mmap, entries) + <num-of-entries> * sizeof(struct multiboot_mmap_entry)
Alignment of the map structure doesn't depends on the number of elements in the entries array, but on the entry type.
Strictly confirming alternative:
If there is known bounary for array size, one can use this boundary for type declaration:
struct multiboot_tag_mmap
{
...
struct multiboot_mmap_entry entries[<UPPER-BOUNDARY>];
};
For such declaration all possible issues described below are not applied.
NOTE about elements accessing:
For accessing elements (above the first one) in such flexible array one need to declare new pointer variable:
struct multiboot_mmap_entry* entries = tag->entries;
entries[index] = ...; // This is OK.
instead of using entries field directly:
tag->entries[index] = ...; // WRONG! May spuriously fail!
The thing is that compiler, knowing that the only one element exists in the entries field array, may optimize last case it to:
tag->entries[0] = ...; // Compiler is in its rights to assume index to have the only allowed value
Issues about standard confirmance: With flexible array approach, there is no object of type struct multiboot_tag_mmap in the memory(in the heap or on the stack). All we have is a pointer of this type, which is never dereferenced (e.g. for making full copy of the object). Similarly, there is no object of the array type struct multiboot_mmap_entry[1], corresponded to the entries field of the structure, this field is used only for conversion to generic pointer of type struct multiboot_mmap_entry*.
So, phrase in the C standard, which denotes Undefine Behavior
An object is assigned to an inexactly overlapping object or to an exactly overlapping object with incompatible type
is inapplicable for accessing entries array field using generic pointer: there is no overlapping object here.

Storing a Dynamic Array of Structures

I have been working on a project which utilizes a dynamic array of structures. To avoid storing the number of structures in its own variables (the count of structures), I have been using an array of pointers to the structure variables with a NULL terminator.
For example, let's say my structure type is defined as:
typedef struct structure_item{
/* ... Structure Variables Here ... */
} item_t;
Now let's say my code has item_t **allItems = { item_1, item_2, item_3, ..., item_n, NULL }; and all item_#s are of the type item_t *.
Using this setup, I then do not have to keep track of another variable which tells me the total number of items. Instead, I can determine the total number of items as needed by saying:
int numberOfStructures;
for( numberOfStructures = 0;
*(allItems + numberOfStructures) != NULL;
numberOfStructures++
);
When this code executes, it counts the total number of pointers before NULL.
As a comparison, this system is similar to C-style strings; whereas tracking the total number of structures would be similar to a Pascal-style string. (Because C uses a NULL terminated array of characters vs. Pascal which tracks the length of its array of characters.)
My question is rather simple, is an array of pointers (pointer to pointer to struct) really necessary or could this be done with an array of structs (pointer to struct)? Can anybody provide better ways to handle this?
Note: it is important that the solution is compatible with both C and C++. This is being used in a wrapper library which is wrapping a C++ library for use in standard C.
Thank you all in advance!
What you need is a sentinel value, a recognizable valid value that means "nothing". For pointers, the standard sentinel value is NULL.
If you want to use your structs directly, you will need to decide on a sentinel value of type item_t, and check for that. Your call.
Yes, it is possible to have an array of structs, and (at least) one of those a defined sentinel (which is that the '\0' used at the end of strings, and the NULL pointer in your case).
What you need to do, for your struct type, is reserve one or more possible values of that struct (composed of the set of values of its members) to indicate a sentinel.
For example, let's say we have a struct type
struct X {int a; char *p};
then define a function
int is_sentinel(struct X x)
{
return x.p == NULL;
}
This will mean any struct X for which the member p is NULL can be used as a sentinel (and the member a would not matter in this case).
Then just loop looking for a sentinel.
Note: to be compatible in both C and C++, the struct type needs to be compatible (e.g. POD).

Accessing consecutive members with a single pointer

I want to access continuously declared member arrays of the same type with a single pointer.
So for example say I have :
struct S
{
double m1[2];
double m2[2];
}
int main()
{
S obj;
double *sp = obj.m1;
// Code that maybe unsafe !!
for (int i(0); i < 4; ++i)
*(sp++) = i; // *
return 0;
}
Under what circumstances is line (*) problematic ?
I know there's for sure a problem when virtual functions are present but I need a more structured answer than my assumptions
You can be sure that the members of the struct are stored in a contiguos block of bytes, in the order they appear. Besides, the elements of the arrays are contiguous. So, it seems that everything is fine.
The problem here is that there is no standard way of knowing if there is padding bytes between consecutive members in the struct.
So, it is unsafe to assume that there is not padding bytes at all.
If you can be plenty sure, for some particular reason, that there are not padding bytes, then the 4 double elements will be contiguous, as you want.
The C++ standard makes certain guarantees about the layout of "plain old data" (or in C++11, standard layout) types. For the most part, these inherit from how C treated such data.
What follows only applies to "plain old data"/"standard layout" structures and data.
If you have two structs with the same initial order and type of arguments, casting a pointer to one to a pointer to the other and accessing their common initial prefix is valid, and will access the corresponding field. This is known as "layout compatible". This also applies if you have a structure X and a structure Y, and X is the first element of the structure Y -- a pointer to Y can be cast to a pointer to X, and it will access the fields of the X substructure in Y.
Now, while it is a common assumption, I am unaware of a requirement of either C or C++ that an array and a structure starting with fields of the same type and count are layout compatible with an array.
Your case is somewhat similar, in that we have two arrays adjacent to each other in a structure, and you are treating it as one large array of size equal to the sum of those two arrays size. It is a relatively common and safe assumption that it works, but I am unaware of a guarantee in the standard that it actually works.
In this kind of undefined behavior, you have to examine your particular compilers guarantees (de facto or explicit) about layout of plain old data/standard layout data, as the C++ standard does not guarantee your code does what you want it to do.

Why do C and C++ compilers allow array lengths in function signatures when they're never enforced?

This is what I found during my learning period:
#include<iostream>
using namespace std;
int dis(char a[1])
{
int length = strlen(a);
char c = a[2];
return length;
}
int main()
{
char b[4] = "abc";
int c = dis(b);
cout << c;
return 0;
}
So in the variable int dis(char a[1]) , the [1] seems to do nothing and doesn't work at
all, because I can use a[2]. Just like int a[] or char *a. I know the array name is a pointer and how to convey an array, so my puzzle is not about this part.
What I want to know is why compilers allow this behavior (int a[1]). Or does it have other meanings that I don't know about?
It is a quirk of the syntax for passing arrays to functions.
Actually it is not possible to pass an array in C. If you write syntax that looks like it should pass the array, what actually happens is that a pointer to the first element of the array is passed instead.
Since the pointer does not include any length information, the contents of your [] in the function formal parameter list are actually ignored.
The decision to allow this syntax was made in the 1970s and has caused much confusion ever since...
The length of the first dimension is ignored, but the length of additional dimensions are necessary to allow the compiler to compute offsets correctly. In the following example, the foo function is passed a pointer to a two-dimensional array.
#include <stdio.h>
void foo(int args[10][20])
{
printf("%zd\n", sizeof(args[0]));
}
int main(int argc, char **argv)
{
int a[2][20];
foo(a);
return 0;
}
The size of the first dimension [10] is ignored; the compiler will not prevent you from indexing off the end (notice that the formal wants 10 elements, but the actual provides only 2). However, the size of the second dimension [20] is used to determine the stride of each row, and here, the formal must match the actual. Again, the compiler will not prevent you from indexing off the end of the second dimension either.
The byte offset from the base of the array to an element args[row][col] is determined by:
sizeof(int)*(col + 20*row)
Note that if col >= 20, then you will actually index into a subsequent row (or off the end of the entire array).
sizeof(args[0]), returns 80 on my machine where sizeof(int) == 4. However, if I attempt to take sizeof(args), I get the following compiler warning:
foo.c:5:27: warning: sizeof on array function parameter will return size of 'int (*)[20]' instead of 'int [10][20]' [-Wsizeof-array-argument]
printf("%zd\n", sizeof(args));
^
foo.c:3:14: note: declared here
void foo(int args[10][20])
^
1 warning generated.
Here, the compiler is warning that it is only going to give the size of the pointer into which the array has decayed instead of the size of the array itself.
The problem and how to overcome it in C++
The problem has been explained extensively by pat and Matt. The compiler is basically ignoring the first dimension of the array's size effectively ignoring the size of the passed argument.
In C++, on the other hand, you can easily overcome this limitation in two ways:
using references
using std::array (since C++11)
References
If your function is only trying to read or modify an existing array (not copying it) you can easily use references.
For example, let's assume you want to have a function that resets an array of ten ints setting every element to 0. You can easily do that by using the following function signature:
void reset(int (&array)[10]) { ... }
Not only this will work just fine, but it will also enforce the dimension of the array.
You can also make use of templates to make the above code generic:
template<class Type, std::size_t N>
void reset(Type (&array)[N]) { ... }
And finally you can take advantage of const correctness. Let's consider a function that prints an array of 10 elements:
void show(const int (&array)[10]) { ... }
By applying the const qualifier we are preventing possible modifications.
The standard library class for arrays
If you consider the above syntax both ugly and unnecessary, as I do, we can throw it in the can and use std::array instead (since C++11).
Here's the refactored code:
void reset(std::array<int, 10>& array) { ... }
void show(std::array<int, 10> const& array) { ... }
Isn't it wonderful? Not to mention that the generic code trick I've taught you earlier, still works:
template<class Type, std::size_t N>
void reset(std::array<Type, N>& array) { ... }
template<class Type, std::size_t N>
void show(const std::array<Type, N>& array) { ... }
Not only that, but you get copy and move semantic for free. :)
void copy(std::array<Type, N> array) {
// a copy of the original passed array
// is made and can be dealt with indipendently
// from the original
}
So, what are you waiting for? Go use std::array.
It's a fun feature of C that allows you to effectively shoot yourself in the foot if you're so inclined. I think the reason is that C is just a step above assembly language. Size checking and similar safety features have been removed to allow for peak performance, which isn't a bad thing if the programmer is being very diligent. Also, assigning a size to the function argument has the advantage that when the function is used by another programmer, there's a chance they'll notice a size restriction. Just using a pointer doesn't convey that information to the next programmer.
First, C never checks array bounds. Doesn't matter if they are local, global, static, parameters, whatever. Checking array bounds means more processing, and C is supposed to be very efficient, so array bounds checking is done by the programmer when needed.
Second, there is a trick that makes it possible to pass-by-value an array to a function. It is also possible to return-by-value an array from a function. You just need to create a new data type using struct. For example:
typedef struct {
int a[10];
} myarray_t;
myarray_t my_function(myarray_t foo) {
myarray_t bar;
...
return bar;
}
You have to access the elements like this: foo.a[1]. The extra ".a" might look weird, but this trick adds great functionality to the C language.
To tell the compiler that myArray points to an array of at least 10 ints:
void bar(int myArray[static 10])
A good compiler should give you a warning if you access myArray [10]. Without the "static" keyword, the 10 would mean nothing at all.
This is a well-known "feature" of C, passed over to C++ because C++ is supposed to correctly compile C code.
Problem arises from several aspects:
An array name is supposed to be completely equivalent to a pointer.
C is supposed to be fast, originally developerd to be a kind of "high-level Assembler" (especially designed to write the first "portable Operating System": Unix), so it is not supposed to insert "hidden" code; runtime range checking is thus "forbidden".
Machine code generrated to access a static array or a dynamic one (either in the stack or allocated) is actually different.
Since the called function cannot know the "kind" of array passed as argument everything is supposed to be a pointer and treated as such.
You could say arrays are not really supported in C (this is not really true, as I was saying before, but it is a good approximation); an array is really treated as a pointer to a block of data and accessed using pointer arithmetic.
Since C does NOT have any form of RTTI You have to declare the size of the array element in the function prototype (to support pointer arithmetic). This is even "more true" for multidimensional arrays.
Anyway all above is not really true anymore :p
Most modern C/C++ compilers do support bounds checking, but standards require it to be off by default (for backward compatibility). Reasonably recent versions of gcc, for example, do compile-time range checking with "-O3 -Wall -Wextra" and full run-time bounds checking with "-fbounds-checking".
C will not only transform a parameter of type int[5] into *int; given the declaration typedef int intArray5[5];, it will transform a parameter of type intArray5 to *int as well. There are some situations where this behavior, although odd, is useful (especially with things like the va_list defined in stdargs.h, which some implementations define as an array). It would be illogical to allow as a parameter a type defined as int[5] (ignoring the dimension) but not allow int[5] to be specified directly.
I find C's handling of parameters of array type to be absurd, but it's a consequence of efforts to take an ad-hoc language, large parts of which weren't particularly well-defined or thought-out, and try to come up with behavioral specifications that are consistent with what existing implementations did for existing programs. Many of the quirks of C make sense when viewed in that light, particularly if one considers that when many of them were invented, large parts of the language we know today didn't exist yet. From what I understand, in the predecessor to C, called BCPL, compilers didn't really keep track of variable types very well. A declaration int arr[5]; was equivalent to int anonymousAllocation[5],*arr = anonymousAllocation;; once the allocation was set aside. the compiler neither knew nor cared whether arr was a pointer or an array. When accessed as either arr[x] or *arr, it would be regarded as a pointer regardless of how it was declared.
One thing that hasn't been answered yet is the actual question.
The answers already given explain that arrays cannot be passed by value to a function in either C or C++. They also explain that a parameter declared as int[] is treated as if it had type int *, and that a variable of type int[] can be passed to such a function.
But they don't explain why it has never been made an error to explicitly provide an array length.
void f(int *); // makes perfect sense
void f(int []); // sort of makes sense
void f(int [10]); // makes no sense
Why isn't the last of these an error?
A reason for that is that it causes problems with typedefs.
typedef int myarray[10];
void f(myarray array);
If it were an error to specify the array length in function parameters, you would not be able to use the myarray name in the function parameter. And since some implementations use array types for standard library types such as va_list, and all implementations are required to make jmp_buf an array type, it would be very problematic if there were no standard way of declaring function parameters using those names: without that ability, there could not be a portable implementation of functions such as vprintf.
It's allowed for compilers to be able to check whether the size of array passed is the same as what expected. Compilers may warn an issue if it's not the case.

What is the syntax for an array type?

Is it type[]? For example, could I have
T<int[]>;
for some template T.
The type of an "array of type T" is T [dimension], which is what you could pass as template parameters. E.g.:
someTemplate<int [10]> t; // array type as template parameter
int a[5]; // array of 5 ints named 'a'
Arrays need to have a dimension which must be greater than 0. This means that e.g. U u[]; is illegal.
There are cases that might seem like exceptions, the first being parameters:
void f(T[]);
This is a special rule for parameters and f() is actually equivalent to the following:
void f(T*);
Then there is direct inialization of arrays:
int a[] = { 1, 2, 3, 4 };
Here the array size is implicitly given through the number of elements in the initializer, thus the type of a is int[4].
There are also incomplete array types without specificied bounds, but you can't directly create instances of these (see Johannes answer for more):
template<class T> struct X { typedef T type; };
X<int[]>::type a = { 1, 2, 3 };
If you are looking for dynamic arrays, prefer standard containers like std::vector<T> instead.
There are two syntaxes to denote array types. The first is the type-id syntax and is used everywhere where the language expects a compile time type, which looks like:
T[constant-expression]
T[]
This specifies an array type that, in the first form, has a number of elements given by an integer constant expression (means it has to be known at compile time). In the second form, it specifies an array type with an unknown number of elements. Similar to class types that you declare without a body, such an array type is said to be incomplete, and you cannot create arrays of that type
// not valid: what size would it have?
int a[];
You can, however, specify that type. For example you may typedef it
typedef int unknown_int_array[];
In the same manner, you may specify it as a template type argument, so the answer to your question is yes you can pass such a type specifier to a template. Notice that i talk about specifiers here, because the form you use here is not the type itself.
The second way is using the new-type-id syntax which allows denoting runtime types by having non-constant bounds
T[expression]
This allows passing variables as element count, and also allows passing a zero. In such a case, a zero element array is created. That syntax is only usable with the new operator for supporting dynamic arrays.
If possible, you might consider instead using dynamic arrays, and passing in a pointer as the templated type. Such as...
T<int*> myVar;
This started as a comment to Georg's answer, but it ran a bit long...
It seems that you may be missing some key abstraction in your mental model of arrays (at least C-style ones). Local arrays are allocated on the stack with a hard-coded size. If you have an array inside a class or struct, the space for the array is part of the object itself (whether on the stack or heap). Global arrays may even be represented directly in the size of the executable.
This means that any time you want to use an array, you must specify its size to the compiler. The only reason you can leave the brackets empty in a parameter list is because functions treat array parameters as pointers. The function would hardly be useful if it could only operate on one size of array.
Templates are no exception. If you want the size of the templated array to vary, you can add an extra template parameter. You still have to specify the size at compile time for any given instance, though.
The syntax for declaring arrays is
<type> <variable>[<size>];
When using a template the declaration is, in example
template <class T>
T var[4];