How does the compiler fill values in char array[100] = {0};? What's the magic behind it?
I wanted to know how internally compiler initializes.
It's not magic.
The behavior of this code in C is described in section 6.7.8.21 of the C specification (online draft of C spec): for the elements that don't have a specified value, the compiler initializes pointers to NULL and arithmetic types to zero (and recursively applies this to aggregates).
The behavior of this code in C++ is described in section 8.5.1.7 of the C++ specification (online draft of C++ spec): the compiler aggregate-initializes the elements that don't have a specified value.
Also, note that in C++ (but not C), you can use an empty initializer list, causing the compiler to aggregate-initialize all of the elements of the array:
char array[100] = {};
As for what sort of code the compiler might generate when you do this, take a look at this question: Strange assembly from array 0-initialization
Implementation is up to compiler developers.
If your question is "what will happen with such declaration" - compiler will set first array element to the value you've provided (0) and all others will be set to zero because it is a default value for omitted array elements.
If your compiler is GCC you can also use following syntax:
int array[256] = {[0 ... 255] = 0};
Please look at
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html#Designated-Inits, and note that this is a compiler-specific feature.
It depends where you put this initialisation.
If the array is static as in
char array[100] = {0};
int main(void)
{
...
}
then it is the compiler that reserves the 100 0 bytes in the data segement of the program. In this case you could have omitted the initialiser.
If your array is auto, then it is another story.
int foo(void)
{
char array[100] = {0};
...
}
In this case at every call of the function foo you will have a hidden memset.
The code above is equivalent to
int foo(void)
{
char array[100];
memset(array, 0, sizeof(array));
....
}
and if you omit the initializer your array will contain random data (the data of the stack).
If your local array is declared static like in
int foo(void)
{
static char array[100] = {0};
...
}
then it is technically the same case as the first one.
Related
This question already has answers here:
How to initialize all members of an array to the same value?
(26 answers)
Closed 8 years ago.
I am trying to find out the correct way to initialise an array to all zeros (i.e. as if you have done a memset on the array).
I have found the following methods from various areas in stack overflow (and other sources):
char myArray1[10] = {0};
char myArray2[10] = {0,};
char myArray3[10] = {[0 ... 9] = 0};
char myArray4[10] = {0,0,0,0,0,0,0,0,0,0};
I would prefer the simplest syntax variant... I was using {0}, but I have not found any proof this actually is correct.
Missing elements in an array will be initialised to 0. In addition, C++ allows you to leave the uniform initialiser empty. So the following works, is minimal and also the most efficient:
T array[N] = {};
It’s worth noting that this works for any type T which can be either default-constructed or initialised, not just built-in types. For example, the following works, and will print foo five times:
#include <iostream>
struct foo {
foo() { std::cout << "foo()\n"; }
};
int main() {
foo arr[5] = {};
}
A more extensive list of the different possibilities was posted by aib some time ago.
From the C++ specification, "Aggregate initialization" (8.5.1):
If there are fewer initializer-clauses in the list than there are members in the aggregate, then each member not explicitly initialized shall be initialized from an empty initializer list.
So each char not in the initializer list would be initialized to char() that is 0.
In C++11 you can type:
char a[10] = {};
char b[10]{};
Some old compilers (or was it in C) may require you add at least one member:
char a[10] = {0};
Naturally, if the array has static lifetime (global or static variable), then it will be zero initialized if there is not initializer:
char global_array[10];
I find it confusing, so I prefer to add the = {} anyway.
About the trailing comma, it is useful if you do something like:
char a[] = {
1,
2,
3,
};
So that you don't make a special case for the last line and you make copy&paste and diffs easier. In your specific case is just useless:
char a[10] = {0,};
That comma does nothing, and it is ugly, so I wouldn't write it.
I prefer this because it is simple yet explicit:
char myArray1[10] = { 0 };
I have a global array, which is indexed by the values of an enum, which has an element representing number of values. The array must be initialized by a special value, which unfortunately is not a 0.
enum {
A, B, C, COUNT
};
extern const int arr[COUNT];
In a .cpp file:
const int arr[COUNT] = { -1, -1, -1 };
The enum is occasionally changed: new values added, some get removed. The error in my code, which I just fixed was an insufficient number of initialization values, which caused the rest of the array to be initialized with zeroes. I would like to put a safeguard against this kind of error.
The problem is to either guarantee that the arr is always completely initialized with the special value (the -1 in the example) or to break compilation to get the developers attention, so the array can be updated manually.
The recent C++ standards are not available (old ms compilers and some proprietary junk). Templates can be used, to an extent. STL and Boost are strongly prohibited (don't ask), but I wont mind to copy or to reimplement the needed parts.
If it turns out to be impossible, I will have to consider changing the special value to be 0, but I would like to avoid that: the special value (the -1) might be a bit too special and encoded implicitly in the rest of the code.
I would like to avoid DSL and code generation: the primary build system is jam on ms windows and it is major PITA to get anything generated there.
The best solution I can come up with is to replace arr[COUNT] with arr[], and then write a template to assert that sizeof(arr) / sizeof(int) == COUNT. This won't ensure that it's initalized to -1, but it will ensure that you've explicitly initialized the array with the correct number of elements.
C++11's static_assert would be even better, or Boost's macro version, but if you don't have either available, you'll have to come up with something on your own.
This is easy.
enum {
A, B, C, COUNT
};
extern const int (&arr)[COUNT];
const int (&arr)[COUNT] = (int[]){ -1, -1, -1};
int main() {
arr[C];
}
At first glance this appears to produce overhead, but when you examine it closely, it simply produces two names for the same variable as far as the compiler cares. So no overhead.
Here it is working: http://ideone.com/Zg32zH, and here's what happens in the error case: http://ideone.com/yq5zt3
prog.cpp:6:27: error: invalid initialization of reference of type ‘const int (&)[3]’ from expression of type ‘const int [2]’
For some compilers you may need to name the temporary
const int arr_init[] = { -1, -1, -1};
const int (&arr)[COUNT] = arr_init;
update
I've been informed the first =(int[]){-1,-1,-1} version is a compiler extension, and so the second =arr_init; version is to be preferred.
Answering my own question: while it seems to be impossible to provide the array with the right amount of initializers directly, it is really easy to just test the list of initializers for the right amount:
#define INITIALIZERS -1, -1, -1,
struct check {
check() {
const char arr[] = {INITIALIZERS};
typedef char t[sizeof(arr) == COUNT ? 1: -1];
}
};
const int arr[COUNT] = { INITIALIZERS };
Thanks #dauphic for the idea to use a variable array to count the values.
The Boost.Preprocessor library might provide something useful, but I doubt whether you will be allowed to use it and it might turn out to be unwieldy to extract from the Boost sources.
This similar question has an answer that looks helpful:
Trick : filling array values using macros (code generation)
The closest I could get to an initialization rather than a check is to use a const reference to an array, then initialize that array within a global object. It's still runtime initialization, but idk how you're using it so this may be good enough.
#include <cstring>
enum {A, B, C, COUNT};
namespace {
class ArrayHolder {
public:
int array[COUNT]; // internal array
ArrayHolder () {
// initialize to all -1s
memset(this->array, -1, sizeof(this->array));
}
};
const ArrayHolder array_holder; // static global container for the array
}
const int (&arr)[COUNT] = array_holder.array; // reference to array initailized
// by ArrayHolder constructor
You can still use the sizeof on it as you would before:
for (size_t i=0; i < sizeof(arr)/sizeof(arr[0]); ++i) {
// do something with arr[i]
}
Edit
If the runtime initialization can never be relied on you should check your implementation details in the asm because the values of arr even when declared with an initializer may still not be known at until runtime initialization
const int arr[1] = {5};
int main() {
int local_array[arr[0]]; // use arr value as length
return 0;
}
compiling with g++ -pedantic gives the warning:
warning: ISO C++ forbids variable length array ‘local_array’ [-Wvla]
another example where compilation actually fails:
const int arr1[1] = {5};
int arr2[arr1[0]];
error: array bound is not an integer constant before ']' token
As for using an array value as a an argument to a global constructor, both constructor calls here are fine:
// [...ArrayHolder definition here...]
class IntegerWrapper{
public:
int value;
IntegerWrapper(int i) : value(i) {}
};
const int (&arr)[COUNT] = array_holder.array;
const int arr1[1] = {5};
IntegerWrapper iw1(arr1[0]); //using = {5}
IntegerWrapper iw2(arr[0]); //using const reference
Additionally the order of initalization of global variables across different source files is not defined, you can't guarantee the arr = {-1, -1, -1}; won't happen until run time. If the compiler is optimizing out the initialization, then you're relying on implementation, not the standard.
The point I really wanna stress here is: int arr[COUNT] = {-1, -1, -1}; is still runtime initialization unless it can get optimized out. The only way you could rely on it being constant would be to use C++11's constexpr but you don't have that available.
As the problem stated, this is doable:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned short int i;
std::cin >> i;
unsigned long long int k[i][i];
}
Here I declared an array that is sized i by i, both dimensions are variables.
But not this:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned short int i;
std::cin >> i;
unsigned long long int** k = new int[i][i];
delete[] k;
}
I got an compiler message telling me that
error: only the first dimension of an allocated array may have dynamic
size
I am forced to do this:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned short int i;
std::cin >> i;
unsigned long long int** k = new unsigned long long int*[i];
for ( unsigned short int idx = 0 ; idx < i ; ++ i )
k[idx] = new unsigned long long int[i];
for ( unsigned short int idx = 0 ; idx < i ; ++ i )
delete[] k[idx];
delete[] k;
}
To my understanding, new and delete are used to allocate something on heap, not on stack, which won't be deleted when it goes out of scope, and is useful for passing datas across functions and objects, etc.
What I don't understand is what happens when I declare that k in the first example, I am told that declared array should (and could) only have constant dimensions, and when in need for a array of unknown size, one should always consider new & delete or vectors.
Is there any pros and cons to those two solutions I'm not getting, or is it just what it is?
I'm using Apple's LLVM compiler by the way.
Neither form is C++ standard compliant, because the standard does not support variable-length arrays (VLAs) (interestingly, C99 does - but C is not C++). However, several compilers have an extension to support this, including your compiler:
From Clang's Manual:
Clang supports such variable length arrays in very limited circumstances for compatibility with GNU C and C99 programs:
The element type of a variable length array must be a POD ("plain old data") type, which means that it cannot have any user-declared constructors or destructors, any base classes, or any members of non-POD type. All C types are POD types.
Variable length arrays cannot be used as the type of a non-type template parameter.
But given that the extension is in place, why doesn't your second snippet work? That's because VLA only applies to automatic variables - that is, arguments or local variables. k is automatic but it's just a pointer - the array itself is defined by new int[i][i], which allocates on the heap and is decidedly not an automatic variable.
You can read more about this on the relevant GCC manual section.
I'm sure you can find implementation for 2D array functionality easily, but you can make your own class too. The simplest way is to use std::vector to hold the data and have an index-mapping function that takes your two coordinates and return a single index into the vector.
The client code will look a little different, instead of arr[x][y] you have arr.at(x,y) but otherwise it does the same. You do not have to fiddle with memory management as that is done by std::vector, just use v.resize(N*N) in constructor or dimension-setting function.
Essentially what compilers generally do with two-dimensional arrays (fixed or variable) is this:
int arr[x][y] ---> int arr[x*y];
arr[2][4]= something ---> arr[2+4*x]= something;
Basically they are just a nicer way of notation of a one-dimensional array (on the stack). Most compilers require fixed sizes, so the compiler has an easier way of telling what the dimensions are (and thus what to multiply with). It appears you have just a compiler, which can keep track of the dimensions (and multipliers) even if you use variables.
Of course you can mimick that with new[] yourself too, but it's not supported by the compiler per se.
Probably for the same reason, i.e. because it would be even harder keeping track of the dimensions, especially when moving the pointers around.
E.g. with a new-pointer you could later write:
newarr= someotherarray;
and someotherarray could be something with even different dimensions. If the compiler did a 2-dim -> one dim translation, he'd have to track all possible size transitions.
With the stack allocated arr above, this isn't necessary, because at least once the compiler made it, it stays that size.
I am trying to use XCode for my project and have this code in my .h:
class FileReader
{
private:
int numberOfNodes;
int startingNode;
int numberOfTerminalNodes;
int terminalNode[];
int numberOfTransitions;
int transitions[];
public:
FileReader();
~FileReader();
};
I get a "Field has incomplete type int[]" error on the terminalNode line... but not on the transitions line. What could be going on? I'm SURE that's the correct syntax?
Strictly speaking the size of an array is part of its type, and an array must have a (greater than zero) size.
There's an extension that allows an array of indeterminate size as the last element of a class. This is used to conveniently access a variable sized array as the last element of a struct.
struct S {
int size;
int data[];
};
S *make_s(int size) {
S *s = (S*)malloc(sizeof(S) + sizeof(int)*size);
s->size = size;
return s;
}
int main() {
S *s = make_s(4);
for (int i=0;i<s->size;++i)
s->data[i] = i;
free(s);
}
This code is unfortunately not valid C++, but it is valid C (C99 or C11). If you've inherited this from some C project, you may be surprised that this works there but not in C++. But the truth of the matter is that you can't have zero-length arrays (which is what the incomplete array int transitions[] is in this context) in C++.
Use a std::vector<int> instead. Or a std::unique_ptr<int[]>.
(Or, if you're really really really fussy about not having two separate memory allocations, you can write your own wrapper class which allocates one single piece of memory and in-place constructs both the preamble and the array. But that's excessive.)
The original C use would have been something like:
FileReader * p = malloc(sizeof(FileReader) + N * sizeof(int));
Then you could have used p->transitions[i], for i in [0, N).
Such a construction obviously doesn't make sense in the object model of C++ (think constructors and exceptions).
You can't put an unbound array length in a header -- there is no way for the compiler to know the class size, thus it can never be instantiated.
Its likely that the lack of error on the transitions line is a result of handling the first error. That is, if you comment out terminalNode, transitions should give the error.
It isn't. If you're inside a struct definition, the compiler needs to know the size of the struct, so it also needs to know the size of all its elements. Because int [] means an array of ints of any length, its size is unknown. Either use a fixed-size array (int field[128];) or a pointer that you'll use to malloc memory (int *field;).
In C99, you can declare a flexible array member of a struct as such:
struct blah
{
int foo[];
};
However, when someone here at work tried to compile some code using clang in C++, that syntax did not work. (It had been working with MSVC.) We had to convert it to:
struct blah
{
int foo[0];
};
Looking through the C++ standard, I found no reference to flexible member arrays at all; I always thought [0] was an invalid declaration, but apparently for a flexible member array it is valid. Are flexible member arrays actually valid in C++? If so, is the correct declaration [] or [0]?
C++ was first standardized in 1998, so it predates the addition of flexible array members to C (which was new in C99). There was a corrigendum to C++ in 2003, but that didn't add any relevant new features. The next revision of C++ (C++2b) is still under development, and it seems flexible array members still aren't added to it.
C++ doesn't support C99 flexible array members at the end of structures, either using an empty index notation or a 0 index notation (barring vendor-specific extensions):
struct blah
{
int count;
int foo[]; // not valid C++
};
struct blah
{
int count;
int foo[0]; // also not valid C++
};
As far as I know, C++0x will not add this, either.
However, if you size the array to 1 element:
struct blah
{
int count;
int foo[1];
};
the code will compile, and work quite well, but it is technically undefined behavior. You can allocate the appropriate memory with an expression that is unlikely to have off-by-one errors:
struct blah* p = (struct blah*) malloc( offsetof(struct blah, foo[desired_number_of_elements]);
if (p) {
p->count = desired_number_of_elements;
// initialize your p->foo[] array however appropriate - it has `count`
// elements (indexable from 0 to count-1)
}
So it's portable between C90, C99 and C++ and works just as well as C99's flexible array members.
Raymond Chen did a nice writeup about this: Why do some structures end with an array of size 1?
Note: In Raymond Chen's article, there's a typo/bug in an example initializing the 'flexible' array. It should read:
for (DWORD Index = 0; Index < NumberOfGroups; Index++) { // note: used '<' , not '='
TokenGroups->Groups[Index] = ...;
}
If you can restrict your application to only require a few known sizes, then you can effectively achieve a flexible array with a template.
template <typename BASE, typename T, unsigned SZ>
struct Flex : public BASE {
T flex_[SZ];
};
The second one will not contain elements but rather will point right after blah. So if you have a structure like this:
struct something
{
int a, b;
int c[0];
};
you can do things like this:
struct something *val = (struct something *)malloc(sizeof(struct something) + 5 * sizeof(int));
val->a = 1;
val->b = 2;
val->c[0] = 3;
In this case c will behave as an array with 5 ints but the data in the array will be after the something structure.
The product I'm working on uses this as a sized string:
struct String
{
unsigned int allocated;
unsigned int size;
char data[0];
};
Because of the supported architectures this will consume 8 bytes plus allocated.
Of course all this is C but g++ for example accepts it without a hitch.
If you only want
struct blah { int foo[]; };
then you don't need the struct at all an you can simply deal with a malloc'ed/new'ed int array.
If you have some members at the beginning:
struct blah { char a,b; /*int foo[]; //not valid in C++*/ };
then in C++, I suppose you could replace foo with a foo member function:
struct blah { alignas(int) char a,b;
int *foo(void) { return reinterpret_cast<int*>(&this[1]); } };
Example use:
#include <stdlib.h>
struct blah {
alignas(int) char a,b;
int *foo(void) { return reinterpret_cast<int*>(&this[1]); }
};
int main()
{
blah *b = (blah*)malloc(sizeof(blah)+10*sizeof(int));
if(!b) return 1;
b->foo()[1]=1;
}
A proposal is underway, and might make into some future C++ version.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1039r0.html for details (the proposal is fairly new, so it's subject to changes)
I faced the same problem to declare a flexible array member which can be used from C++ code. By looking through glibc headers I found that there are some usages of flexible array members, e.g. in struct inotify which is declared as follows (comments and some unrelated members omitted):
struct inotify_event
{
//Some members
char name __flexarr;
};
The __flexarr macro, in turn is defined as
/* Support for flexible arrays.
Headers that should use flexible arrays only if they're "real"
(e.g. only if they won't affect sizeof()) should test
#if __glibc_c99_flexarr_available. */
#if defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L
# define __flexarr []
# define __glibc_c99_flexarr_available 1
#elif __GNUC_PREREQ (2,97)
/* GCC 2.97 supports C99 flexible array members as an extension,
even when in C89 mode or compiling C++ (any version). */
# define __flexarr []
# define __glibc_c99_flexarr_available 1
#elif defined __GNUC__
/* Pre-2.97 GCC did not support C99 flexible arrays but did have
an equivalent extension with slightly different notation. */
# define __flexarr [0]
# define __glibc_c99_flexarr_available 1
#else
/* Some other non-C99 compiler. Approximate with [1]. */
# define __flexarr [1]
# define __glibc_c99_flexarr_available 0
#endif
I'm not familar with MSVC compiler, but probably you'd have to add one more conditional macro depending on MSVC version.
Flexible arrays are not part of the C++ standard yet. That is why int foo[] or int foo[0] may not compile. While there is a proposal being discussed, it has not been accepted to the newest revision of C++ (C++2b) yet.
However, almost all modern compiler do support it via compiler extensions.
GCC has zero length array extension which is supported for C++.
Clang aims to supports a broad range of GCC extensions.
MSVC has a non standard extension and a warning associated with it.
The catch is that if you use this extension with the highest warning level (-Wall --pedantic), it may result into a warning.
A workaround to this is to use an array with one element and do access out of bounds. While this solution is UB by the spec (dcl.array and expr.add), most of the compilers will produce valid code and even clang -fsanitize=undefined is happy with it:
#include <new>
#include <type_traits>
struct A {
int a[1];
};
int main()
{
using storage_type = std::aligned_storage_t<1024, alignof(A)>;
static storage_type memory;
A *ptr_a = new (&memory) A;
ptr_a->a[2] = 42;
return ptr_a->a[2];
}
demo
Having all that said, if you want your code to be standard compliant and do not depend on any compiler extension, you will have to avoid using this feature.
Flexible array members are not supported in standard C++, however the clang documentation says.
"In addition to the language extensions listed here, Clang aims to support a broad range of GCC extensions."
The gcc documentation for C++ says.
"The GNU compiler provides these extensions to the C++ language (and you can also use most of the C language extensions in your C++ programs)."
And the gcc documentation for C documents support for arrays of zero length.
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
The better solution is to declare it as a pointer:
struct blah
{
int* foo;
};
Or better yet, to declare it as a std::vector:
struct blah
{
std::vector<int> foo;
};