Behaviour of sizeof() operator - c++

For the following piece of code:
char a[] = "Apple";
char *s[] = {"Apple"};
printf("%d %d\n", sizeof(a), sizeof(s[0]));
The output is:
6 4
Can someone tell me why sizeof() is giving different outputs?
EDIT: I did intend to type sizeof() originally, but typed strlen() instead. I apologize for the same. I have edited the statement now.

sizeof(a) is the size of the array, which contains the terminator as well as the 5 printable characters.
strlen(s[0]) gives the length of the string, excluding the terminator, since that is what strlen is specified to do.
UPDATE: sizeof(s[0]) is the size of a pointer. There's no way to determine the size of an array given just a pointer to it.

sizeof gives you the number of chars allocated to a while strlen gives you the length of useable string in a.

the \0 counts as part of the size in memory of the string, but the length of the string itself, given by strlen(), is only given by the characters before the \0 is encountered.

A character array declared like this:
char a[] = "Apple";
has, according to the language specification, a null-terminator. Therefore, the length of the array is 6. There are 5 characters, and then the null terminator.
On the other hand, strlen() returns the number of characters that precede the null terminator. Which is 5.

The sizeof operator yields the size (in bytes) of its operand.
In this statement
char a[] = "Apple";
array a is initialized by characters of string literal "Apple" that includes the terminating zero.
In fact this record is equivalent to
char a[] = { 'A', 'p', 'p', 'l', 'e', '\0'; };
So the size in bytes of a is equal to 6.
Standard C function strlen counts symbols in a string until it encounters the terminating zero. So
strlen( a )
will return 5 that is the number of characters in the array that are before the terminating zero.
Take into account that you could write for example
char a[100] = "Apple";
In this case sizeof( a ) will yield 100 because you explicitly specified the number of bytes that the array will occupy. However it was initialized only with 6 characters of the string literal. So how to find how many actual data are in the character array? For this purpose function strlen was introduced that to distinguish the size of a character array and the number of actual data in the character array.

Because in C there is no string type. String is a character array which is NULL terminated. strlen() counts the characters until the NULL character, whereas sizeof() actually returns the amount of memory used up by the charater array.

a is an array of chars, which contains 6 elements, hence sizeof returns 6 (the length of the string including zero termination).
s is an array of pointers to char. The pointer size is 4 bytes. sizeof(s[0]) returns the size of the first element, which is pointer, i.e. its size is 4.

When you define:
char a[] = "Apple";
It means an array of characters, which equals to the following definition:
char a[] = {'A', 'p', 'p', 'l', 'e', '\0'}; // '\0' is the string termination character which equals to 0
Since char type size is 1, the sizeof(a) returns 6 which is the size of the whole array.
Nevertheless, when you define:
char *s[] = {"Apple"};
It means an array of char pointer. Hence sizeof(s[0]) return the size of its first element which
equals to sizeof(char*).
For a 32-bit platform, sizeof(char*) = 4. If you do it on an 64-bit platform, 8 is the expected value.

Related

Is the "NUL" character ("\0") byte separated or is it assigned to that char-array with the string?

If i define a string:
char array[5] = {"hello"};
Is the NUL character (\0) byte "hidden" added to "array[5]", so that the array is not contained of 5 bytes in memory, it is contained of 6 bytes?
OR does the NUL character byte lie "separated" from "array[5]" in memory only after the last element of the char-array, but not directly assigned to "array[5]"?
If i would put this:
i = strlen(array);
printf("The Amount of bytes preserved for array: %d",i);
What would be the result for the amount of bytes preserved for array[5]?
Does the "NUL" character ("\0") byte lie separated after the last element of char-array in the memory or is it assigned to that char-array?
Does the "NUL" character ("\0") byte lie separated after the last element of char-array in the memory or is it assigned to that char-array?
No. Neither answer is correct. See below for details.
Answer for C:
If you write your code like that, with an explicit size that is too small for the terminator, array will have exactly 5 elements and there will be no NUL character.
strlen(array) has undefined behavior because array is not a string (it has no terminator). char array[5] = {"hello"}; is equivalent to char array[5] = {'h', 'e', 'l', 'l', 'o'};.
On the other hand, if you write
char array[] = "hello";
it is equivalent to
char array[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
and strlen will report 5.
The relevant part of the C standard is:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
(Emphasis mine.)
Answer for C++:
Your code is invalid. [dcl.init.string] states:
There shall not be more initializers than there are array elements. [ Example:
char cv[4] = "asdf"; // error
is ill-formed since there is no space for the implied trailing '\0'. — end example ]
In C++, char array[5] = {"hello"} is of six bytes. But you have assigned five bytes only. Therefore, the array declaration is incorrect.
Alternatively, this works: char array[6] = {"hello"}.

Why can I assign a string literal whose length is less than the array itself?

I'm a bit baffled that this is allowed:
char num[6] = "a";
What is happening here? Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
Why can I assign a string literal less than the array itself? What is happening here?
This is well defined. When initialize character arrays with string literal,
If the size of the array is specified and it is larger than the number
of characters in the string literal, the remaining characters are
zero-initialized.
So,
char num[6] = "a";
// equivalent to char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
The value will be copied, i.e. the elements of the array will be initialized by the chars of the string literal (including '\0').
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
Successive characters of the string literal (which includes the implicit terminating null character) initialize the elements of the array.
char num[6] = "a";
is equivalent to
char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Why can I assign a string literal less than the array itself?
This is allowed by the language. It is often useful to be able to add more characters to the array later, which wouldn't be possible if the existing characters filled the entire array.
Am I assigning a pointer to the array
No. You cannot assign a pointer to an array, so that is not happening.
or copying the literal values into the array
That is exactly what is happening.
and therefore I'm able to modify them later
You are able to modify the array, indeed.
Just use char num[6] = {"a"};. It works.
This kind of declaration is a special syntax sugar thing. It's equivalent to
char num[6] = {'a', 0}
The array is always modifiable. Its contents after such a declaration would be a character representing 'a', a zero (NUL terminator) and the remainder of the array will also be zeroed (zero initialization).
That is one type of declaration whcih is equivalent to
char num[6] = {'a','\0'};
You declared c-string with length of max. 5 normal chars, at the end must me \0 to end c - string.
With declaration you can use
char num[6] = "a";
then you need to assign value:
With strcpy(dest,src)
strcpy(num,"test");
Char by char
num[0]='t';
num[1]='e';
num[2]='s';
num[3]='t';
num[4]='\0';

Why does sizeof("string") prints one more than the length of "string"? [duplicate]

This question already has answers here:
Sizeof string literal
(2 answers)
Closed 7 years ago.
# include <stdio.h>
int main()
{
int a = sizeof("string");
printf("%d", a );
return 0;
}
The above code prints 7 as the output while the length of string is only 6. Could someone please explain?
This is what is happening:
The string literal "string" is an array of (const) char, including a null-terminator character, i.e. {'s', 't', 'r', 'i', 'n', 'g', '\0'}. In your example, said array has 7 elements.
The operator sizeof when applied to an array yields its size in bytes.
The size of an array is the sum of the size of each of its elements.
The size of one char is 1.
So, you get the number of chars explicit in the literal, plus a null-terminator.
This behaviour is the same in both C and C++
All strings in c, even the ones you write like "string" are terminated with a null byte '\0'.
The expression "string" in c or c++ code is considered a string literal, that means a char array that looks like this: {'s','t','r','i','n','g','\0'}. In c and c++ all strings must be terminated by a null byte in memory, otherwise functions cannot identify their end.
This is, by the way, also why you must add 1 character when calculating lenght of character arrays in c code to account for the null byte. So if you want to store "string" in a character array that you declare, you would have to do char array[7]; using 6 here would lead to undefined behaviour.
The length of the string is 6 and the null character is also added by default when we specify the string in "". So it prints 7.
Strings are null-terminated and stored as an array containing the characters and terminated with a null character ('\0', called NUL in ASCII). So you have:
s | t | r | i | n | g | \0 |
^-- 7th char
Because "string" is implicitly adding a '\0' character.

Difference in the initialisation of character array

I do the following for character array intialisation :
char a[] = "teststring";
char b[]={'a','a','b','b','a'};
While for the first, if I need to get the string length, I must do strlen(a) ....for the other string I should do sizeof(b)/sizeof(b[0]).
why this difference?
EDIT : (I got this)
char name[10]="StudyTonight"; //valid character array initialization
char name[10]={'L','e','s','s','o','n','s','\0'}; //valid initialization
Remember that when you initialize a character array by listings all its characters separately then you must supply the '\0' character explicitly.
I get that with char b we have to add '\0' to make a proper initialisation.
ANOTHER :
Therefore, the array of char elements called myword can be initialized with a null-terminated sequence of characters by either one of these two statements:
char myword[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char myword[] = "Hello";
A string literal, like "teststring" contains the characters between the double-quotes, plus a terminating char with value zero. So
char a[] = "ab";
has the same effect as;
char a[] = {'a', 'b', '\0'};
strlen() searches for that character with value '\0'. So strlen(a) in this case will return 2.
Conversely, sizeof() gets the actual size of the memory used. Since sizeof(char) is 1, by definition in the standard, this means sizeof(a) give the value of 3 - it counts the 'a', the 'b', and the '\0'.
a is a C-style string, i.e, null-terminated char array. The initialization is equivalent to:
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
b, however, is not null-terminated, so it's not a C-style string, you can't use functions like std::strlen() because they are only valid for C-style strings.
String literals are expanded into char arrays, but also include the terminating zero char. So think the
char a[] = "teststring";
as if you have types this
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
A rule of thumb
Whenever you will use strlen() on a char array, use string literals its for initialization. The strlen function can be thought as a simple scan the terminating zero char (\0) counting the iterations needed.
A word about sizeof
Even if sometimes used with parentheses, sizeof is an operator, an integral part of the C++ language (inherited from C times). In cases like char c[] = "hello";, sizeof(c) will return 6, which is exactly 1 more than strlen(c), and you might be thinking: "lets skip that inefficient scanning for the terminator", but sizeof stops to be such "efficient" as soon as it works on pointers, and arrays can (and will) be used as pointers whenever required. Look at the following example:
#include <iostream>
// naive approach, don't do that
int myarraysize(char s[])
{
return sizeof(s);
}
int main ()
{
char c[] = "hello";
std::cout << sizeof(c) << " vs " << myarraysize(c) << std::endl;
return 0;
}
online demo
You can always write
char b[]={'a','a','b','b','a','\0'};
to overcome "the difference".
Also note
sizeof(b)/sizeof(b[0])
essentially boils down to
sizeof(b)
since sizeof(char) is always 1. Your formula is used for any other array element types.

Why isn't strlen working for me?

char p[4]={'h','g','y'};
cout<<strlen(p);
This code prints 3.
char p[3]={'h','g','y'};
cout<<strlen(p);
This prints 8.
char p[]={'h','g','y'};
cout<<strlen(p);
This again prints 8.
Please help me as I can't figure out why three different values are printed by changing the size of the array.
strlen starts at the given pointer and advances until it reaches the character '\0'. If you don't have a '\0' in your array, it could be any number until a '\0' is reached.
Another way to reach the number you're looking for (in the case you've shown) is by using: int length = sizeof(p)/sizeof(*p);, which will give you the length of the array. However, that is not strictly the string length as defined by strlen.
As #John Dibling mentions, the reason that strlen gives the correct result on your first example is that you've allocated space for 4 characters, but only used 3; the remaining 1 character is automatically initialized to 0, which is exactly the '\0' character that strlen looks for.
Only your first example has a null terminated array of characters - the other two examples have no null termination, so you can't use strlen() on them in a well-defined manner.
char p[4]={'h','g','y'}; // p[3] is implicitly initialized to '\0'
char p[3]={'h','g','y'}; // no room in p[] for a '\0' terminator
char p[]={'h','g','y'}; // p[] implicitly sized to 3 - also no room for '\0'
Note that in the last case, if you used a string literal to initialize the array, you would get a null terminator:
char p[]= "hgy"; // p[] has 4 elements, last one is '\0'
That will get you a random number. strlen requires that strings be terminated with a '\0' to work.
try this:
char p[4]={'h','g','y', '\0'};
strlen is a standard library function that works with strings (in C sense of the term). String is defined as an array of char values that ends with a \0 value. If you supply something that is not a string to strlen, the behavior is undefined: the code might crash, the code might produce meaningless results etc.
In your examples only the first one supplies strlen with a string, which is why it works as expected. In the second and the third case, what you supply is not a string (not terminated with \0), which is why the results expectedly make no sense.
'\0' terminate your char buffer.
char p[4]={'h','g','y', '\0'};
This is because strlen() expects to find a null-terminator for the string. In this case, you don't have it, so strlen() keeps counting until it finds a \0 or gives a memory access violation and your program dies. RIP!