By accident I found that the line char s[] = {"Hello World"}; is properly compiled and seems to be treated the same as char s[] = "Hello World";. Isn't the first ({"Hello World"}) an array containing one element that is an array of char, so the declaration for s should read char *s[]? In fact if I change it to char *s[] = {"Hello World"}; the compiler accepts it as well, as expected.
Searching for an answer, the only place I found which mentioned this is this one but there is no citing of the standard.
So my question is, why the line char s[] = {"Hello World"}; is compiled although the left side is of type array of char and the right side is of type array of array of char?
Following is a working program:
#include<stdio.h>
int main() {
char s[] = {"Hello World"};
printf("%s", s); // Same output if line above is char s[] = "Hello World";
return 0;
}
Thanks for any clarifications.
P.S. My compiler is gcc-4.3.4.
It's allowed because the standard says so: C99 section 6.7.8, §14:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
What this means is that both
char s[] = { "Hello World" };
and
char s[] = "Hello World";
are nothing more than syntactic sugar for
char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', 0 };
On a related note (same section, §11), C also allows braces around scalar initializers like
int foo = { 42 };
which, incidentally, fits nicely with the syntax for compound literals
(int){ 42 }
The braces are optional, and the expression is equivalent to just an array of char.
You can also write this:
int a = {100}; //ok
Demo : http://ideone.com/z0psd
In fact, C++11 generalizes this very syntax, to initialize non-arrays as well as arrays, uniformly. So in C++11, you can have these:
int a{}; //a is initialized to zero, and it is NOT an array
int b[]{1,2,3,4}; //b is an array of size 4 containing elements 1,2,3,4
int c[10]{}; //all 10 elements are initialized to zero
int *d{}; //pointer initialized to nullptr
std::vector<int> v{1,2,3,4,5}; //vector is initialized uniformly as well.
Any variable in (int, char, etc.) is just an array of length 1.
char s = {0};
works as well.
I might be wrong, but I think this is not an array of arrays of chars, but a block contains an array of chars. int a = {1}; may work as well.
[...] In fact if I change it to
char *s[] = {"Hello World"}; the compiler accepts it as well, as
expected
The compiler accepets it,because actually, you're making an array 2D of undefined size elements,where you stored one element only,the "Hello World" string. Something like this:
char* s[] = {"Hello world", "foo", "baa" ...};
You can't omit the bracets in this case.
This is allowed by the C++ standard as well, Citation:
[dcl.init.string] §1
An array of narrow character type ([basic.fundamental]), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow string literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces ([lex.string]). [snip]
Related
char s[10] = "Test";
How are the remaining chars (after "Test" and terminating null) initialized? (Is it defined?)
Background
I'm doing this to write a custom fixed-width (and ignored) header into an STL file. But I wouldn't like to have random/uninitialized bytes in the remaining space.
The general rule for any array (or struct) where not all members are initialized explicitly, is that the remaining ones are initialized "as if they had static storage duration". Which means that they are set to zero.
So it will actually work just fine to write something weird like this: char s[10] = {'T','e','s','t'};. Since the remaining bytes are set to zero and the first of them will be treated as the null terminator.
How are the remaining chars (after "Test" and terminating null) initialized? (Is it defined?)
Yes, it's well defined, in a char array initialized with a string literal and with specified size larger than the length of the string literal all the remaining elements are zero-initialized.
From C++ standard (tip-of-trunk) Character arrays § 9.4.3 [dcl.init.string]
3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized ([dcl.init]).
Some examples from cppreference:
char a[] = "abc";
// equivalent to char a[4] = {'a', 'b', 'c', '\0'};
// unsigned char b[3] = "abc"; // Error: initializer string too long
unsigned char b[5]{"abc"};
// equivalent to unsigned char b[5] = {'a', 'b', 'c', '\0', '\0'};
wchar_t c[] = {L"кошка"}; // optional braces
// equivalent to wchar_t c[6] = {L'к', L'о', L'ш', L'к', L'а', L'\0'};
I am new to c++ language,recently, as I was taught that:
we should put '\0' at the end of char array while doing initialization ,for example :
char x[6] = "hello"; //OK
However,if you do :
char x[5] = "hello";
Then this would raise the error :
initializer-string for array of chars is too long
Everything goes as I expect until the experssion below does not raise the compile error...:
char x[5] = {'h','e','l','l','o'};
This really confuses me , So I would like to ask two questions :
1.Why doesn't expression char x[5] = "hello"; raise error?
2.To my knowledge,the function strlen() would stop only if it finds '\0' to determine the lengh of char array,in this case,what would strlen(x) return?
Thanks!
The string literal "hello" has six characters, because there's an implied nul terminator. So
char x[] = "hello";
defines an array of six char. That's almost always what you want, because the C-style string functions (strlen, strcpy, strcat, etc.) operate on C-style strings, which are, by definition, nul terminated.
But that doesn't mean that every array of char will be nul terminated.
char x[] = { 'h', 'e', 'l', 'l', 'o' };
This defines an array of five char. Applying C-style string functions to this array will result in undefined behavior, because the array does not have a nul terminator.
You can do character-by-character initialization and create a valid C-style string by explicitly including the nul terminator:
char x[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This defines an array of six char that holds a C-style string (i.e., a nul terminated sequence of characters).
The key here is to separate in your mind the general notion of an array of char from the more specific notion of an array of char that holds a C-style string. The latter is almost always what you want to do, but that doesn't mean that there is never a use for the former. It's just that the former is uncommon.
As an aside, in C you're allowed to elide the nul terminator:
char x[5] = "hello";
this is legal C, and it creates an array of 5 char, with no nul terminator. In C++ that's not legal.
Why doesn't expression char x[5] = "hello"; raise an error?
This is not true. The appearance of an error is expected in this case.
To my knowledge, the function strlen() would stop only if it finds '\0' to determine the length of the char array, in this case, what would strlen(x) return?
If you can run the code somehow, the program will undergo an undefined-behavior. That is, you will not get what you would expect. The strlen() will only stop counting when it finds a null-terminator, i.e. it may go outside the initialized part of the char array and access the uninitialized ones – it's where the UB is invoked.
Why can I create a string or array of chars in this way:
#include <iostream>
int main() {
const char *string = "Hello, World!";
std::cout << string[1] << std::endl;
}
? and it outputs the second element correctly, while I can't make an array of integer type without the array's subscript notation [ ]? What's the difference between the char's one and this one: const int* intArray={3,54,12,53};.
The "why" is: "Because string literals are special". The string literal is stored in the binary, as a constant part of the program itself, and const char *string = "Hello, World!"; is just treating the literal as an anonymous array stored elsewhere which it then stores a pointer to in string.
There is no equivalent special behavior for other types, but you can get the same basic solution by making a named static constant and using that to initialize the pointer, e.g.
int main() {
static const int intstatic[] = {3,54,12,53};
const int *intptr = intstatic;
std::cout << intptr[1] << std::endl;
}
The effect of the static const array is to allocate the same constant space the string literal would use (though unlike string literals, it's less likely that the compiler will identify duplicate arrays and coalesce the storage), but as a named variable rather than an anonymous one. The string case could be made explicit in the same way:
int main() {
static const char hellostatic[] = "Hello, World!";
const char *string = hellostatic;
std::cout << string[1] << std::endl;
}
but using the literal directly makes things a little cleaner.
You almost can. There are a couple of things at work.
{1,2,3} and "abc" are not the same thing. In fact, if you wanted to draw a comparison, "abc" should rather be compared to {'a', 'b', 'c', '\0'}. Both of them are valid array initializers:
char foo[] = "abc";
char bar[] = {'a', 'b', 'c', '\0'};
However, only "abc" is also a valid expression to initialize a pointer in C++.
In C (and as an extension in some C++ compilers, including Clang and GCC), you can cast compound literals to an array type, like this:
static const int* array = (const int[]){1, 2, 3};
However, this is almost never correct. It works at the global scope and as a function argument, but if you try to initialize a variable of automatic storage with it (i.e. a variable within a function), you'll get a pointer to a location that is about to expire, so you won't be able to use it for anything useful.
Such a feature exists in C and is named compound literal.
For example
#include <stdio.h>
int main(void)
{
const int *intArray = ( int[] ){ 3, 54, 12, 53 };
printf( "%d\n", intArray[1] );
return 0;
}
However C++ does not support this feature from C.
There is a difference compared with string literals. String literals have static storage duration independent on where they appear while compound literals have either static storage duration or automatic storage duration dependent on where they are appear.
In C++ something that is close to this feature is std::initializer_list . For example
#include <iostream>
#include <initializer_list>
int main()
{
const auto &myArray = { 3, 54, 12, 53 };
std::cout << myArray.begin()[1] << std::endl;
return 0;
}
The strings litterals come from the C language. Any string declared with double quotes in the code is automatically converted as a const char[].
So this:
const char str[6] = "hello";
Is exactly the same as:
const char str[6] = { 'h', 'e', 'l', 'l', 'o', '\0' };
I'm a bit baffled that this is allowed:
char num[6] = "a";
What is happening here? Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
Why can I assign a string literal less than the array itself? What is happening here?
This is well defined. When initialize character arrays with string literal,
If the size of the array is specified and it is larger than the number
of characters in the string literal, the remaining characters are
zero-initialized.
So,
char num[6] = "a";
// equivalent to char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
The value will be copied, i.e. the elements of the array will be initialized by the chars of the string literal (including '\0').
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
Successive characters of the string literal (which includes the implicit terminating null character) initialize the elements of the array.
char num[6] = "a";
is equivalent to
char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Why can I assign a string literal less than the array itself?
This is allowed by the language. It is often useful to be able to add more characters to the array later, which wouldn't be possible if the existing characters filled the entire array.
Am I assigning a pointer to the array
No. You cannot assign a pointer to an array, so that is not happening.
or copying the literal values into the array
That is exactly what is happening.
and therefore I'm able to modify them later
You are able to modify the array, indeed.
Just use char num[6] = {"a"};. It works.
This kind of declaration is a special syntax sugar thing. It's equivalent to
char num[6] = {'a', 0}
The array is always modifiable. Its contents after such a declaration would be a character representing 'a', a zero (NUL terminator) and the remainder of the array will also be zeroed (zero initialization).
That is one type of declaration whcih is equivalent to
char num[6] = {'a','\0'};
You declared c-string with length of max. 5 normal chars, at the end must me \0 to end c - string.
With declaration you can use
char num[6] = "a";
then you need to assign value:
With strcpy(dest,src)
strcpy(num,"test");
Char by char
num[0]='t';
num[1]='e';
num[2]='s';
num[3]='t';
num[4]='\0';
I do the following for character array intialisation :
char a[] = "teststring";
char b[]={'a','a','b','b','a'};
While for the first, if I need to get the string length, I must do strlen(a) ....for the other string I should do sizeof(b)/sizeof(b[0]).
why this difference?
EDIT : (I got this)
char name[10]="StudyTonight"; //valid character array initialization
char name[10]={'L','e','s','s','o','n','s','\0'}; //valid initialization
Remember that when you initialize a character array by listings all its characters separately then you must supply the '\0' character explicitly.
I get that with char b we have to add '\0' to make a proper initialisation.
ANOTHER :
Therefore, the array of char elements called myword can be initialized with a null-terminated sequence of characters by either one of these two statements:
char myword[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char myword[] = "Hello";
A string literal, like "teststring" contains the characters between the double-quotes, plus a terminating char with value zero. So
char a[] = "ab";
has the same effect as;
char a[] = {'a', 'b', '\0'};
strlen() searches for that character with value '\0'. So strlen(a) in this case will return 2.
Conversely, sizeof() gets the actual size of the memory used. Since sizeof(char) is 1, by definition in the standard, this means sizeof(a) give the value of 3 - it counts the 'a', the 'b', and the '\0'.
a is a C-style string, i.e, null-terminated char array. The initialization is equivalent to:
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
b, however, is not null-terminated, so it's not a C-style string, you can't use functions like std::strlen() because they are only valid for C-style strings.
String literals are expanded into char arrays, but also include the terminating zero char. So think the
char a[] = "teststring";
as if you have types this
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
A rule of thumb
Whenever you will use strlen() on a char array, use string literals its for initialization. The strlen function can be thought as a simple scan the terminating zero char (\0) counting the iterations needed.
A word about sizeof
Even if sometimes used with parentheses, sizeof is an operator, an integral part of the C++ language (inherited from C times). In cases like char c[] = "hello";, sizeof(c) will return 6, which is exactly 1 more than strlen(c), and you might be thinking: "lets skip that inefficient scanning for the terminator", but sizeof stops to be such "efficient" as soon as it works on pointers, and arrays can (and will) be used as pointers whenever required. Look at the following example:
#include <iostream>
// naive approach, don't do that
int myarraysize(char s[])
{
return sizeof(s);
}
int main ()
{
char c[] = "hello";
std::cout << sizeof(c) << " vs " << myarraysize(c) << std::endl;
return 0;
}
online demo
You can always write
char b[]={'a','a','b','b','a','\0'};
to overcome "the difference".
Also note
sizeof(b)/sizeof(b[0])
essentially boils down to
sizeof(b)
since sizeof(char) is always 1. Your formula is used for any other array element types.