How is char array stored in C++? - c++

int main()
{
char c1[5]="abcde";
char c2[5]={'a','b','c','d','e'};
char *s1 = c1;
char *s2 = c2;
printf("%s",s1);
printf("%s",s2);
return 0;
}
In this code snippet, the char array C2 doesn't return any error but the char array C1 returns string too long. I know that C1 must require a size of 6 to store 5 characters as it stores the \0 (NULL char) in the last index. But I'm confused why C2 works just fine then?
Also, when C2 is printed using %s, the output is abcde# where # is a gibberish character. %s with printf prints all the characters starting from the given address till \0 is encountered. I don't understand why is it printing that extra character at the end?

You've created two unterminated strings. Make your arrays big enough to hold the null terminator and you'll avoid this undefined behaviour:
char c1[6] = "abcde";
char c2[6] = {'a','b','c','d','e','\0'};
Strictly, speaking the latter doesn't actually require the '\0'. This declaration is equivalent and will include the null terminator:
char c2[6] = {'a','b','c','d','e'};
I personally prefer the first form, but with the added convenience of being able to leave out the explicit length:
char c1[] = "abcde";

I know that C1 must require a size of 6 to store 5 characters as it stores the \0 (NULL char) in the last index. But I'm confused why C2 works just fine then?
The compiler does not complain about the initialization of c2 because initializing with {'a','b','c','d','e'} does not implicitly include a terminating null character.
In contrast, initializing with "abcde" does include a null character: The C standard defines a string literal to include a terminating null character, so char c1[5]="abcde"; nominally initializes a 5-element array with 6 values. The C standard does not require a warning or error in this case because C 2018 6.7.9 14 indicates that null character may be neglected if the array does not have room for it. However, the compiler you are using1 has chosen to issue a warning message because this form of initialization often indicates an error: The programmer attempted to initialize an array with a string, but there is not room for the full string.
In C, arrays of characters and strings are different things: An array is a sequence of values, and an array of characters can contain any arbitrary values of those characters, including no zero value at the end and possible zero values in the middle. For example, if we have a buffer of bytes from a binary file, the bytes are just integer values to us; their meaning as characters that might be printed is irrelevant. A string is a sequence of characters that is terminated by a null character. It cannot have internal zero values because the first null character marks the end.
So, when you define an array of characters such as char c1[5], the compiler does not automatically know whether you intend to use it to hold strings or you intended to use it as an array of arbitrary values. When you initialize the array with a string, your compiler is essentially figuring you intend to use the array to hold strings, and it warns you if the string you use to initialize the array does not fit. When you initialize the array with a list of values, your compiler essentially figures you may be using it to hold arbitrary values, and it does not warn you that there could be a missing terminator.
Also, when C2 is printed using %s, the output is abcde# where # is a gibberish character.
Because c2 does not have a terminating character, attempting to print it runs off the end of the array, resulting in behavior not defined by the C standard. Commonly, printf continues reading memory beyond the array, printing whatever happens to be there until it reaches a null character.
Footnote
1 This assumes you are indeed using a C compiler to compile this source code. C++ has different rules and does not permit an array being initialized with a string literal to be too short to include the terminating null character.

Related

Is NUL set automatically, if I provide an extra element for it in the declaration of the respective char array?

Is '\0' set automatically if I provide an extra element for it, but left it in the initialization string?
Like:
char a[6] = {"Hello"}; // <- Is NUL set here automatically?
I´ve did one experiment with C and C++:`
C:
#include <stdio.h>
int main()
{
char NEWYEAR[16] = {"Happy New Year!"};
printf("%s\n",NEWYEAR);
return 0;
}
Output:
Happy New Year!
C++:
#include <iostream>
int main()
{
char NEWYEAR[16] = {"Happy New Year!"};
std::cout << NEWYEAR << std::endl;
return 0;
}
Output:
Happy New Year!
The compilers did not threw an error or warning and the result is as desired. So it might seem to work correctly. But is that really true?
Is everything correct by doing so?
Is this maybe bad programming style?
Does this cause any issues?
It is more complex than that
char a[6] = "Hello";
will initialize the array of characters to Hello\0, because Hello has an implicit terminating zero.
char a[6] = "Hello\0";
would be valid in C, but invalid in C++ because the literal is 7 characters long, having both an implicit terminator and explicit embedded null character. C allows the literal to drop the implicit terminator. C11 6.7.9p14:
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
char a[5] = "Hello";
would be valid C, resulting in a char array that does not contain a zero-terminated string. It is invalid in C++.
(emphasis mine). It means that the implicit terminating null is optionally added, if there is room in the array, but it does not need to.
And
char a[4] = "Hello";
in C would bring the literal Hell, because while it is a constraint violation in C (C11 6.7.9p2),
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
attempting to initialize more elements than there are items in a list usually just generates a warning in many compilers and is then often ignored by programmers. The paragraph 14 does not have an exception for anything other besides the implicit terminator.
And lastly
char a[7] = "Hello";
in both C and C++ would result in a character array of 7 elements containing the characters Hello\0\0, because in an array having an initializer, the elements not explicitly initialized by the initializer will be default-initialized as if initialized by literal 0. In this case the first 6 elements will be initialized explicitly and the 7th implicitly.
Given the possibility of silently truncating the terminator in C, it is better to just omit the array size and write
char a[] = "Hello";
This will declare a as array of 6 elements, just like char a[6] = "Hello";, but you cannot mistype the array size.
If there's space for the null-terminator then it will be added.
In C (but not C++) if the size of the array is the length of the string except the null-terminator, then the null-terminator will not be added. So e.g.
char a[5] = "Hello";
is valid, but there won't be a null-terminator in the array.
It's not valid to provide a smaller size than the string length.

Why does C++ need array of 6 size to store 5 letter word whereas C allows just 5?

I have tried this following statement in C and C++.
char A[5] = {"Hello"};
While C accepts this, C++ is throwing an error saying the string is too long. If there is a null character to be added, why is it accepted in C but not in C++?
Please note that char A[5]={"Hello"}; is a bug in either language. There must be room to allocate the null terminator.
It compiles in C because the language 6.7.9/14 has an an odd special rule/language bug, emphasis mine:
An array of character type may be initialized by a character string literal or UTF−8 string
literal, optionally enclosed in braces. Successive bytes of the string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
This allows a character array to be initialized with a string literal which has exactly the same amount of characters as the size of the array, but silently discard the null termination.
C++ fixed this dangerous language bug.

sprintf() is adding an extra variable

Why is this happening?:
char buf[256];
char date[8];
sprintf(date, "%d%02d%02d", Time.year(), Time.month(), Time.day());
snprintf(buf, sizeof(buf), "{\"team\":\"%s\"}", team.c_str());
Serial.println(date);
output:
20180202{"team":"IND"}
it should only be: 20180202
I don't know why {"team":"IND"} is getting added to the end of it.
Very likely you declared two arrays and they are lined up in a way that allowed for the buf to overwrite the null terminator of date and thus it's "concatenating" the two.
I can't write code to reproduce this because it's undefined behavior and thus not reliable. But I can tell you how you can avoid it,
snprintf(date, sizeof(date), "%d%02d%02d", Time.year(), Time.month(), Time.day());
snprintf(buf, sizeof(buf), "{\"team\":\"%s\"}", team.c_str());
Having said that, why are you using snprintf() when this appears to be c++? And so there are more suitable solutions for this kind of problem.
This would print an incorrect value, but would not cause any unexpected behavior.
Strings in c are simply arrays with a special arrangement. If the string has n printable characters it should be stored in an array of size n + 1, so that you can add what is called a null terminator. It's a special value that indicates the end of the string.
Your second snprintf() is overwriting such null terminator of the date array and thus appearing to concatenate both strings.
You have reserved space to store exactly 8 chars:
char date[8];
To store the date properly 20180202 you need
char date[9];
because sprintf() puts the extra '\0' character to the buffer you pass for proper c-style string termination.
I'd suspect you declared your buffers like
char buffer[???];
char date[8];
since these are most likely stored on your processors stack, you need to read that backwards, thus the output placed at buffer overwrites that terminating '\0', and appears immediately after date.

Is a c-style string containing only one char considered a string?

Is a c-style string containing only one char considered a string or would you call that construct a char?
Zero or more characters followed by a NUL-terminator is a C-style string. You can use the double quotation character notation to define a literal.
In C, an int that can fit into a char, such as '3' is a char.
Something like '34' is multicharacter literal.
A one element buffer is still technically a buffer. Forming a pointer to the start of it is not at all affected by how many items are in it.
So no, it's not a char. Furthermore, even the type system would differentiate char[1] from char.
It's also worth nothing that you may be surprised by what is a 1 character string. Because this one "a" has two characters in the buffer, not one. The only one character buffer that is a valid C-string is the empty string.
Is a c-style string containing only one char considered a string or
would you call that construct a char?
Indeed a C-Style string means a string i.e. it is quite different from a char data type. Since in C language, You don't have a dedicated built-in type to manipulate and represent string type like in C++ we have std::string hence once has to use character arrays (essentially null terminated) i.e. char str[SIZE] = "something" to represent character string type. On the other hand a single character is stored in char which is altogether different from char []. These two things are not same!
Example,
char str[] = "a"; // sizeof(str) will give 2 because presence of extra NULL character
char c = 'a'; // simply a single character

Why isn't strlen working for me?

char p[4]={'h','g','y'};
cout<<strlen(p);
This code prints 3.
char p[3]={'h','g','y'};
cout<<strlen(p);
This prints 8.
char p[]={'h','g','y'};
cout<<strlen(p);
This again prints 8.
Please help me as I can't figure out why three different values are printed by changing the size of the array.
strlen starts at the given pointer and advances until it reaches the character '\0'. If you don't have a '\0' in your array, it could be any number until a '\0' is reached.
Another way to reach the number you're looking for (in the case you've shown) is by using: int length = sizeof(p)/sizeof(*p);, which will give you the length of the array. However, that is not strictly the string length as defined by strlen.
As #John Dibling mentions, the reason that strlen gives the correct result on your first example is that you've allocated space for 4 characters, but only used 3; the remaining 1 character is automatically initialized to 0, which is exactly the '\0' character that strlen looks for.
Only your first example has a null terminated array of characters - the other two examples have no null termination, so you can't use strlen() on them in a well-defined manner.
char p[4]={'h','g','y'}; // p[3] is implicitly initialized to '\0'
char p[3]={'h','g','y'}; // no room in p[] for a '\0' terminator
char p[]={'h','g','y'}; // p[] implicitly sized to 3 - also no room for '\0'
Note that in the last case, if you used a string literal to initialize the array, you would get a null terminator:
char p[]= "hgy"; // p[] has 4 elements, last one is '\0'
That will get you a random number. strlen requires that strings be terminated with a '\0' to work.
try this:
char p[4]={'h','g','y', '\0'};
strlen is a standard library function that works with strings (in C sense of the term). String is defined as an array of char values that ends with a \0 value. If you supply something that is not a string to strlen, the behavior is undefined: the code might crash, the code might produce meaningless results etc.
In your examples only the first one supplies strlen with a string, which is why it works as expected. In the second and the third case, what you supply is not a string (not terminated with \0), which is why the results expectedly make no sense.
'\0' terminate your char buffer.
char p[4]={'h','g','y', '\0'};
This is because strlen() expects to find a null-terminator for the string. In this case, you don't have it, so strlen() keeps counting until it finds a \0 or gives a memory access violation and your program dies. RIP!