Class with incomplete char array [duplicate] - c++

Why does C permit this:
typedef struct s
{
int arr[];
} s;
where the array arr has no size specified?

This is C99 feature called flexible arrays, the main feature is to allow the use variable length array like features inside a struct and R.. in this answer to another question on flexible array members provides a list of benefits to using flexible arrays over pointers. The draft C99 standard in section 6.7.2.1 Structure and union specifiers paragraph 16 says:
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
So if you had a s* you would allocate space for the array in addition to space required for the struct, usually you would have other members in the structure:
s *s1 = malloc( sizeof(struct s) + n*sizeof(int) ) ;
the draft standard actually has a instructive example in paragraph 17:
EXAMPLE After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this
is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p
behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the
offsets of member d might not be the same).

You are probably looking for flexible arrays in C99. Flexible array members are members of unknown size at the end of a struct/union.
As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply.
You may also look at the reason for the struct hack in the first place.
It's not clear if it's legal or portable, but it is rather popular. An implementation of the technique might look something like this:
#include <stdlib.h>
#include <string.h>
struct name *makename(char *newname)
{
struct name *ret =
malloc(sizeof(struct name)-1 + strlen(newname)+1);
/* -1 for initial [1]; +1 for \0 */
if(ret != NULL) {
ret->namelen = strlen(newname);
strcpy(ret->namestr, newname);
}
return ret;
}
This function allocates an instance of the name structure with the
size adjusted so that the namestr field can hold the requested name
(not just one character, as the structure declaration would suggest).
Despite its popularity, the technique is also somewhat notorious -
Dennis Ritchie has called it "unwarranted chumminess with the C implementation." An official interpretation has deemed that it is NOT
strictly conforming with the C Standard, although it does seem to work
under all known implementations. Compilers that check array bounds
carefully might issue warnings.

Related

Is there a standard-compliant way to determine the alignment of a non-static member?

Suppose I have some structure S and a non-static member member, as in this example:
struct S { alignas(alignof(void *)) char member[sizeof(void *)]; };
How do you get the alignment of member?
The operator alignof can only be applied to complete types, not expressions [in 7.6.2.5.1], although GCC allows it, so alignof(S::member) and Clang supports it.
What is the "language-lawyerly" standard way to do it without this restriction?
Also, sizeof allows expression arguments, is there a reason for the asymmetry?
The practical concern is to be able to get the alignment of members of template structures, you can do decltype to get their type, sizeof to get their size, but then you also need the alignment.
The alignment of a type or variable is a description of what memory addresses the variable can inhabit—the address must be a multiple of the alignment*. However, for data-members, the address of the data-member can be any K * alignof(S) + offsetof(S, member). Let's define the alignment of a data-member to be the maximum possible integer E such that &some_s.member is always a multiple of E.
Given a type S with member member, let A = alignof(S), O = offsetof(S, member).
The valid addresses of S{}.member are V = K * A + O for some integer K.
V = K * A + O = gcd(A, O) * (K * A / gcd(A, O) + O / gcd(A, O)).
For the case where K = 1, no other factors exist.
Thus, gcd(A, O) is the best factor valid for unknown K.
In other words, "alignof(S.member)" == gcd(alignof(S), offsetof(S, member)).
Note that this alignment is always a power of two, as alignof(S) is always a power of two.
*: In my brief foray into the standard, I couldn't find this guarantee, meaning that the address of the variable could be K * alignment + some_integer. However, this doesn't affect the final result.
We can define a macro to compute the alignment of a data-member:
#include <cstddef> // for offsetof(...)
#include <numeric> // for std::gcd
// Must be a macro, as `offsetof` is a macro because the member name must be known
// at preprocessing time.
#define ALIGNOF_MEMBER(cls, member) (::std::gcd(alignof(cls), offsetof(cls, member)))
This is only guaranteed valid for standard layout types, as offsetof is only guaranteed valid for standard layout types. If the class is not standard layout, this operation is conditionally supported.
Example:
#include <cstddef>
#include <numeric>
struct S1 { char foo; alignas(alignof(void *)) char member[sizeof(void *)]; };
struct S2 { char foo; char member[sizeof(void *)]; };
#define ALIGNOF_MEMBER(cls, member) (::std::gcd(alignof(cls), offsetof(cls, member)))
int f1() { return ALIGNOF_MEMBER(S1, member); } // returns alignof(void *) == 8
int f2() { return ALIGNOF_MEMBER(S1, foo); } // returns 8*
int f3() { return ALIGNOF_MEMBER(S2, member); } // returns 1
// *: alignof(S1) == 8, so the `foo` member must always be at an alignment of 8
Compiler Explorer
I don't think it's possible. In the general case, declaring a non-static data member with an alignment specifier might not change the layout of the class that contains it. In the below example, if (as is most common) int has a size and alignment of 4, the structs S1 and S2 are likely to have the same layout, with a total size of 8 bytes. Each is likely to have 3 bytes of padding at the end:
struct S1 {
int x;
char y;
};
struct S2 {
int x;
alignas(4) char y;
};
This prevents us from using any information about the layout of the struct to determine the alignment of y. And as the OP noted, alignof(S::member) isn't valid.
By the way, there also isn't any way to query the alignment specifier of a regular variable. You can use the std::align function to check whether the variable is allocated at an address that is appropriately aligned for an object with alignment X, but this doesn't imply that the variable was actually declared with an alignment of X or greater. It could have been declared with an alignment less than X and coincidentally ended up allocated at an address that could have supported an object with alignment X.
Since this functionality is unsupported not only for non-static data members but also regular variables, I'm inclined to think that it's not an oversight; it's deliberately not supported because it's not useful. The compiler needs to know the alignment specifier so that it can allocate the variable or data member appropriately. That is not the programmer's job. Sure, the programmer may need to know the alignment requirement of a type in order to appropriately allocate memory for instances of that type, but you cannot, as the programmer, create additional instances of a variable, other than by triggering some condition that makes it happen automatically (e.g., continuing to the next iteration of a loop will deallocate and reallocate automatic variables in the loop's body). Nor can you, as of now, create a second class at compile time that's guaranteed to be layout-compatible with a given class, which is the main application I can think of for the hypothetical "query alignment of non-static data member" feature. I expect that, once C++ provides enough other reflection functionality so that something like that is close to possible, someone will also put forth a realistic proposal to add a way to query the alignment of a non-static data member.

C++ Zero Length Arrays in Header File

From ISO/IEC 14882:2003 8.3.4/1:
If the constant-expression (5.19) is present, it shall be an integral
constant expression and its value shall be greater than zero.
Therefore the following should not compile:
#pragma once
class IAmAClass
{
public:
IAmAClass();
~IAmAClass();
private:
int somearray[0]; // Zero sized array
};
But it does. However, the following:
#pragma once
class IAmAClass
{
public:
IAmAClass();
~IAmAClass();
private:
int somearray[0];
int var = 23; // Added this expression
};
does not compile, with the following error (as what would be expected) (Visual C++)
error C2229: class 'IAmAClass' has an illegal zero-sized array
When the code is in a function, it, in accordance with the standard, will never compile.
So, why does the code behave in such a way in a header file, where the difference of the compilation passing or failing appears to be down to whether a statement proceeds the zero sized array declaration or not.
The keyword in "If the constant-expression (5.19) is present," is if. It's not, so the first version compiles.
However, such variant arrays are only permissible (and sane) when they are the last element in a struct or class, where it's expected that they'll use extra space allocated to the struct on a case-by-case basis.
If an unknown-length array were allowed before other elements, how would other code know where in memory to find those elements?
This is a Visual C++ language extension: Declaring Unsized Arrays in Member Lists. From the linked MSDN page:
Unsized arrays can be declared as the last data member in class member lists if the program is not compiled with the ANSI-compatibility option (/Za)
Edit: If the member has been declared as a zero-sized array (like int somearray[0];) instead of an array of unknown bounds (like int somearray[];), this is still a language extension, albeit a different one
A zero-sized array is legal only when the array is the last field in a struct or union and when the Microsoft extensions (/Ze) are enabled.
This extension is similar to C99's flexible array members C11/n1570 §6.7.2.1/18
As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
and /20 contains an example:
EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical
way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to
by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
[...]

Only one array without a size allowed per struct?

I was writing a struct to describe a constant value I needed, and noticed something strange.
namespace res{
namespace font{
struct Structure{
struct Glyph{
int x, y, width, height, easement, advance;
};
int glyphCount;
unsigned char asciiMap[]; // <-- always generates an error
Glyph glyphData[]; // <-- never generates an error
};
const Structure system = {95,
{
// mapping data
},
{
// glyph spacing data
}
}; // system constructor
} // namespace font
} // namespace res
The last two members of Structure, the unsized arrays, do not stop the compiler if they are by themselves. But if they are both included in the struct's definition, it causes an error, saying the "type is incomplete"
This stops being a problem if I give the first array a size. Which isn't a problem in this case, but I'm still curious...
My question is, why can I have one unsized array in my struct, but two cause a problem?
In standard C++, you can't do this at all, although some compilers support it as an extension.
In C, every member of a struct needs to have a fixed position within the struct. This means that the last member can have an unknown size; but nothing can come after it, so there is no way to have more than one member of unknown size.
If you do take advantage of your compilers non-standard support for this hack in C++, then beware that things may go horribly wrong if any member of the struct is non-trivial. An object can only be "created" with a non-empty array at the end by allocating a block of raw memory and reinterpreting it as this type; if you do that, no constructors or destructors will be called.
You are using a non-standard microsoft extension. C11 (note: C, not C++) allows the last array in a structure to be unsized (read: a maximum of one arrays):
A Microsoft extension allows the last member of a C or C++ structure or class to be a variable-sized array. These are called unsized arrays. The unsized array at the end of the structure allows you to append a variable-sized string or other array, thus avoiding the run-time execution cost of a pointer dereference.
// unsized_arrays_in_structures1.cpp
// compile with: /c
struct PERSON {
unsigned number;
char name[]; // Unsized array
};
If you apply the sizeof operator to this structure, the ending array size is considered to be 0. The size of this structure is 2 bytes, which is the size of the unsigned member. To get the true size of a variable of type PERSON, you would need to obtain the array size separately.
The size of the structure is added to the size of the array to get the total size to be allocated. After allocation, the array is copied to the array member of the structure, as shown below:
The compiler needs to be able to decide on the offset of every member within the struct. That's why you're not allowed to place any further members after an unsized array. It follows from this that you can't have two unsized arrays in a struct.
It is an extension from Microsoft, and sizeof(structure) == sizeof(structure_without_variable_size_array).
I guess they use the initializer to find the size of the array. If you have two variable size arrays, you can't find it (equivalent to find one unique solution of a 2-unknown system with only 1 equation...)
Arrays without a dimension are not allowed in a struct,
period, at least in C++. In C, the last member (and only the
last) may be declared without a dimension, and some compilers
allow this in C++, as an extension, but you shouldn't count on
it (and in strict mode, they should at least complain about it).
Other compilers have implemented the same semantics if the last
element had a dimension of 0 (also an extension, requiring
a diagnostic in strict mode).
The reason for limiting incomplete array types to the last
element is simple: what would be the offset of any following
elements? Even when it is the last element, there are
restrictions to the use of the resulting struct: it cannot be
a member of another struct or an array, for example, and
sizeof ignores this last element.

What happens if I define a 0-size array in C/C++?

Just curious, what actually happens if I define a zero-length array int array[0]; in code? GCC doesn't complain at all.
Sample Program
#include <stdio.h>
int main() {
int arr[0];
return 0;
}
Clarification
I'm actually trying to figure out if zero-length arrays initialised this way, instead of being pointed at like the variable length in Darhazer's comments, are optimised out or not.
This is because I have to release some code out into the wild, so I'm trying to figure out if I have to handle cases where the SIZE is defined as 0, which happens in some code with a statically defined int array[SIZE];
I was actually surprised that GCC does not complain, which led to my question. From the answers I've received, I believe the lack of a warning is largely due to supporting old code which has not been updated with the new [] syntax.
Because I was mainly wondering about the error, I am tagging Lundin's answer as correct (Nawaz's was first, but it wasn't as complete) -- the others were pointing out its actual use for tail-padded structures, while relevant, isn't exactly what I was looking for.
An array cannot have zero size.
ISO 9899:2011 6.7.6.2:
If the expression is a constant expression, it shall have a value greater than zero.
The above text is true both for a plain array (paragraph 1). For a VLA (variable length array), the behavior is undefined if the expression's value is less than or equal to zero (paragraph 5). This is normative text in the C standard. A compiler is not allowed to implement it differently.
gcc -std=c99 -pedantic gives a warning for the non-VLA case.
As per the standard, it is not allowed.
However it's been current practice in C compilers to treat those declarations as a flexible array member (FAM) declaration:
C99 6.7.2.1, §16: As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
The standard syntax of a FAM is:
struct Array {
size_t size;
int content[];
};
The idea is that you would then allocate it so:
void foo(size_t x) {
Array* array = malloc(sizeof(size_t) + x * sizeof(int));
array->size = x;
for (size_t i = 0; i != x; ++i) {
array->content[i] = 0;
}
}
You might also use it statically (gcc extension):
Array a = { 3, { 1, 2, 3 } };
This is also known as tail-padded structures (this term predates the publication of the C99 Standard) or struct hack (thanks to Joe Wreschnig for pointing it out).
However this syntax was standardized (and the effects guaranteed) only lately in C99. Before a constant size was necessary.
1 was the portable way to go, though it was rather strange.
0 was better at indicating intent, but not legal as far as the Standard was concerned and supported as an extension by some compilers (including gcc).
The tail padding practice, however, relies on the fact that storage is available (careful malloc) so is not suited to stack usage in general.
In Standard C and C++, zero-size array is not allowed..
If you're using GCC, compile it with -pedantic option. It will give warning, saying:
zero.c:3:6: warning: ISO C forbids zero-size array 'a' [-pedantic]
In case of C++, it gives similar warning.
It's totally illegal, and always has been, but a lot of compilers
neglect to signal the error. I'm not sure why you want to do this.
The one use I know of is to trigger a compile time error from a boolean:
char someCondition[ condition ];
If condition is a false, then I get a compile time error. Because
compilers do allow this, however, I've taken to using:
char someCondition[ 2 * condition - 1 ];
This gives a size of either 1 or -1, and I've never found a compiler
which would accept a size of -1.
Another use of zero-length arrays is for making variable-length object (pre-C99). Zero-length arrays are different from flexible arrays which have [] without 0.
Quoted from gcc doc:
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure that is really a header for a variable-length object:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied.
A real-world example is zero-length arrays of struct kdbus_item in kdbus.h (a Linux kernel module).
I'll add that there is a whole page of the online documentation of gcc on this argument.
Some quotes:
Zero-length arrays are allowed in GNU C.
In ISO C90, you would have to give contents a length of 1
and
GCC versions before 3.0 allowed zero-length arrays to be statically initialized, as if they were flexible arrays. In addition to those cases that were useful, it also allowed initializations in situations that would corrupt later data
so you could
int arr[0] = { 1 };
and boom :-)
Zero-size array declarations within structs would be useful if they were allowed, and if the semantics were such that (1) they would force alignment but otherwise not allocate any space, and (2) indexing the array would be considered defined behavior in the case where the resulting pointer would be within the same block of memory as the struct. Such behavior was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets.
The struct hack, as commonly implemented using an array of size 1, is dodgy and I don't think there's any requirement that compilers refrain from breaking it. For example, I would expect that if a compiler sees int a[1], it would be within its rights to regard a[i] as a[0]. If someone tries to work around the alignment issues of the struct hack via something like
typedef struct {
uint32_t size;
uint8_t data[4]; // Use four, to avoid having padding throw off the size of the struct
}
a compiler might get clever and assume the array size really is four:
; As written
foo = myStruct->data[i];
; As interpreted (assuming little-endian hardware)
foo = ((*(uint32_t*)myStruct->data) >> (i << 3)) & 0xFF;
Such an optimization might be reasonable, especially if myStruct->data could be loaded into a register in the same operation as myStruct->size. I know nothing in the standard that would forbid such optimization, though of course it would break any code which might expect to access stuff beyond the fourth element.
Definitely you can't have zero sized arrays by standard, but actually every most popular compiler gives you to do that. So I will try to explain why it can be bad
#include <cstdio>
int main() {
struct A {
A() {
printf("A()\n");
}
~A() {
printf("~A()\n");
}
int empty[0];
};
A vals[3];
}
I am like a human would expect such output:
A()
A()
A()
~A()
~A()
~A()
Clang prints this:
A()
~A()
GCC prints this:
A()
A()
A()
It is totally strange, so it is a good reason not to use empty arrays in C++ if you can.
Also there is extension in GNU C, which gives you to create zero length array in C, but as I understand it right, there should be at least one member in structure prior, or you will get very strange examples as above if you use C++.

Difference between char and char[1]

In C++ what is the difference (if any) between using char and char[1].
examples:
struct SomeStruct
{
char x;
char y[1];
};
Do the same reasons follow for unsigned char?
The main difference is just the syntax you use to access your one char.
By "access" I mean, act upon it using the various operators in the language, most or all of which do different things when applied to a char compared with a char array. This makes it sound as if x and y are almost entirely different. If fact they both "consist of" one char, but that char has been represented in a very different way.
The implementation could cause there to be other differences, for example it could align and pad the structure differently according to which one you use. But I doubt it will.
An example of the operator differences is that a char is assignable, and an array isn't:
SomeStruct a;
a.x = 'a';
a.y[0] = 'a';
SomeStruct b;
b.x = a.x; // OK
b.y = a.y; // not OK
b.y[0] = a.y[0]; // OK
But the fact that y isn't assignable doesn't stop SomeStruct being assignable:
b = a; // OK
All this is regardless of the type, char or not. An object of a type, and an array of that type with size 1, are pretty much the same in terms of what's in memory.
As an aside, there is a context in which it makes a big difference which you "use" out of char and char[1], and which sometimes helps confuse people into thinking that arrays are really pointers. Not your example, but as a function parameter:
void foo(char c); // a function which takes a char as a parameter
void bar(char c[1]); // a function which takes a char* as a parameter
void baz(char c[12]); // also a function which takes a char* as a parameter
The numbers provided in the declarations of bar and baz are completely ignored by the C++ language. Apparently someone at some point felt that it would be useful to programmers as a form of documentation, indicating that the function baz is expecting its pointer argument to point to the first element of an array of 12 char.
In bar and baz, c never has array type - it looks like an array type, but it isn't, it's just a fancy special-case syntax with the same meaning as char *c. Which is why I put the quotation marks on "use" - you aren't really using char[1] at all, it just looks like it.
If you've actually seen the construct char y[1] as the last member of a struct in production code, then it is fairly likely that you've encountered an instance of the struct hack.
That short array is a stand-in for a real, but variable length array (recall that before c99 there was no such thing in the c standard). The programmer would always allocate such structs on the heap, taking care to insure that the allocation was big enough for the actual size of array that he wanted to use.
As well as the notational differences in usage emphasised by Steve, char[1] can be passed to e.g. template <int N> void f(char(&a)[N]), where char x = '\0'; f(&x); wouldn't match. Reliably capturing the size of array arguments is very convenient and reassuring.
It may also imply something different: either that the real length may be longer (as explained by dmckee), or that the content is logically an ASCIIZ string (that happens to be empty in this case), or an array of characters (that happens to have one element). If the structure was one of several related structures (e.g. a mathematical vector where the array size was a template argument, or an encoding of the layout of memory needed for some I/O operation), then it's entirely possible that some similarity with other fields where the arrays may be larger would suggest a preference for a single-character array, allowing support code to be simpler and/or more universally applicable.