Why do we need a null terminator in C++ strings?

Why do we need a null terminator in C++ strings? - c++

I'm new to programming and very new to C++, and I recently came across strings.
Why do we need a null terminator at the end of a character list?
I've read answers like since we might not use all the spaces of an array therefore we need the null terminator for the program to know where the string ends e.g. char[100] = "John"
but why can't the program just loop through the array to check how many spaces are filled and hence decide the length?
And if only four characters are filled in the array for the word "John", what are the others spaces filled with?

The other characters in the array char john[100] = "John" would be filled with zeros, which are all null-terminators. In general, when you initialize an array and don't provide enough elements to fill it up, the remaining elements are default-initialized:
int foo[3] {5}; // this is {5, 0, 0}
int bar[3] {}; // this is {0, 0, 0}
char john[5] = "John"; // this is {'J', 'o', 'h', 'n', 0}
char peter[5] = "Peter"; // ERROR, initializer string too long
// (one null-terminator is mandatory)
Also see cppreference on Array initialization. To find the length of such a string, we just loop through the characters until we find 0 and exit.
The motivation behind null-terminating strings in C++ is to ensure compatibility with C-libraries, which use null-terminated strings. Also see What's the rationale for null terminated strings?
Containers like std::string don't require the string to be null-terminated and can even store a string containing null-characters. This is because they store the size of the string separately. However, the characters of a std::string are often null-terminated anyways so that std::string::c_str() doesn't require a modification of the underlying array.
C++-only libraries will rarely -if ever- pass C-strings between functions.

The existance of a null terminator is a design decision. The purpose it serves is marking the end of the string. There are other ways to do this, for example in Pascal the first element of a string is it's size so no null terminator is needed.
In the example you give only the first 5 elements of the array will be initialized, the rest are zero initialized. Notice how I said 5 elements and not just four. The fifth element is the null terminator.
Sure the program can loop through the string to find out it's length but how will it know when to stop looping?

The nul terminator is what tells you what spaces are filled. Everything up to and including the nul terminator has been filled. Everything after it has not.
There is no general notion of which elements of an array have been filled. An array holds some number of elements; its size is determined when it is created. All of its elements have some value initially; there's no way, in general, to determine which ones have been assigned a value and which ones have not from looking at the values of the elements.
Strings are arrays of char and a coding convention that the "end" of the string is marked by a nul character. Most of the string manipulation functions rely on this convention.
A string literal, such as "John", is an array of char. "John" has 5 elements in the array: 'J', 'o', 'h', 'n', '\0'. The function strcpy, for example, copies characters until it sees that nul terminator:
char result[100]; // no meaningful values here
strcpy(result, "John");
After the call to strcpy, the first five elements of result are 'J', 'o', 'h', 'n', and '\0'. The rest of the array elements have no meaningful values.
I would be remiss if I didn't mention that this style of string comes from C, and is often referred to as C-style strings. C++ supports all of the C string stuff, but it also has a more sophisticated notion of a string, std::string, which is completely different. In general, you should be using C++-style strings and not C-style strings.

Related

declare char variable in c++, why need to add 1 for declare the array size [duplicate]

This question already has answers here:
What is a null-terminated string?
(7 answers)
Closed 2 years ago.
There is a saying when we declare char variable.
We should declare like this -> char ArrayName[Maximum_C-String_Size+1];
For example:
char arr[4+1] = {'a', 'b', 'c', 'd'}
but
arr[4] = {'a', 'b', 'c', 'd'} is also work
why need to add 1?
thanks!

There is no need to do this, unless you are defining something that will be used as a null-terminated string.
// these two definitions are equivalent
char a[5] = { 'a', 'b', 'c', 'd' };
char b[5] = { 'a', 'b', 'c', 'd', '\0' };
If you only want an array with 4 char values in it, and you won't be using that with anything that expects to find a string terminator, then you don't need to add an extra element.

If you’re storing a C-style string in an array, then you need an extra element for the string terminator.
Unlike C++, C does not have a unique string data type. In C, a string is simply a sequence of character values including a zero-valued terminator. The string "foo" is represented as the sequence {'f','o','o',0}. That terminator is how the various string handling functions know where the string ends. The terminator is not a printable character and is not counted towards the length of the string (strlen("foo") returns 3, not 4), however you need to set aside space to store it. So, if you need to store a string that’s N characters long, then the array in which it is stored needs to be at least N+1 elements wide to account for the terminator.
However, if you’re storing a sequence that’s not meant to be treated as a string (you don’t intend to print it or manipulate it with the string library functions), then you don’t need to set aside the extra element.

In C (also C++), how does the printf("%s",&st[0]) function knows how much it has to print, as we have passed only address of byte of string [duplicate]

This question already has answers here:
What's the rationale for null terminated strings?
(20 answers)
Closed 2 years ago.
For code in C char st[] = "heloThere"; printf("%s",&st[0]); , How does printf knows it has to print upto heloThere only, not beyond that, as we are passing only address of first byte to function. Also highlight for C++. And also this worksprintf("%s",st). As we are not passing any address as above. How printf works differently ? Is it defined separately ? (Then it will be overloading which is not supported by C)

In C and C++ string is a sequence of characters terminated by the zero character '\0'.
This array declaration
char st[] = "heloThere";
is equivalent to
char st[] = { 'h', 'e', 'l', 'o', 'T', 'h', 'e', 'r', 'e', '\0' };
So this call of printf
printf("%s",&st[0]);
outputs characters pointed to by the expression &st[0] until the zero character is encountered.
This call
printf("%s",st)
is equivalent to the previous call because array designators used in expressions with rare exceptions are converted to pointers to their first elements.

How does the printf(“%s”,&st[0]) function knows how much it has to print, as we have passed only address of byte of string?
It knows it because of the string-terminating null character '\0', which every string needs to provide at its end.
For example the string literal "hello" is stored in memory as "hello\0" and the string consists of 6 characters, not 5.
Note that strlen does not count the terminating \0 with, so it returns 5, not 6.
Every array of char has to have at least one element reserved for this terminating character. Else if you access this array by string operating functions, it invokes undefined behavior.
If you declare
char st[] = "heloThere";
the compiler automatically calculates the amount of elements needed, which is in this case 10, not 9.

C or C++ compilers add '\0' a the end of a string or char array to mark the end.
char* s = (char*)"Hello!";
this line actually looks at the memory like
so you may notice the added '\0', and that's what printf does it loops and counts characters are between " " and send that portion of memory to the out stream, another thing printf does is, it returns the the final value of the characters counter, but cout in c++ is more complex as it is an object and not a function.
this execution of the code below demonstrate that printf returns the counter value.
cout << printf("%d", printf("%s",s));
Execution : Hello!61
the execution above is we print first 'hello!' which is done by the printf on the right then we print the return of it by the second printf on the left, which we got 6 (the number of characters in "hello!") then we print again using cout the return of the second printf which is one character which is "6" the size of the first char array.

why char* passed to FUNCTION always with the len of the string

i am learning c/c++ recently.but i don't understand the difference between
int a(chat* str,int len)
{
cout<<str<<len;
}
and
int a(char* str)
{
cout<<str<<strlen(str);
}

When you pass char* without a length, how would you know how many elements to process? char* means a pointer to a character. When you pass a pointer, you have no idea (and cannot find out) how much memory (if any) was allocated for the pointer.
That's why C-strings use are null-terminated (they end with a '\0' character), so you can detect their length by iterating the pointer. Hence, if you want to use a pointer without giving the length of its allocated memory, you need to obey some conventions. But in general, e.g. when passing a buffer, you shouldn't expect any end-signalling character, so in this case you need to pass the length, otherwise may end up reading/writing out of bounds.
For your particular example, you're fine with passing only a pointer provided you use your function only on C-strings, since strlen(str) uses this convention of counting until encountering a '\0'.
Buffer overflows are one the most messy and nightmarish programming errors, which can result in serious security issues. That's why you should try (whenever possible) to use std::string from the C++ standard library instead of C-style char* strings.

A C-String should always contain a termination character, we call it null character. It's technically 0 (not the number 0, but ASCII 0)
When we create a char* and initialize it with some text, it automatically adds the '\0' to the end.
char* c = "Hello";
This will create an array of char with six elements. Yes, six elements.
c = {'H', 'e', 'l', 'l', 'o', '\0'}
When you print c, it will search till it finds that '\0'. What if someone replaces it.
c[5] = '!';
Then the system can't determine the end of the text. Then it will keep on reading the memory (which does not belong to that variable, or may be even the program) until it hits a null char.
That is the main reason to pass the size (or number or chars to read) to a function.
On the other hand, if you need to read some data from a stream, you can use a buffer. In that case, you should specify how many bytes to read, in that way you will not cause buffer overflows.

Above answers are to the point. So I'm going to discuss other perspective behind of practise of passing length along with char *.
As others said, not always, the string pointed by char * end up with \0. Only when the string ends with \0 strlen() would actually work. There are certain use-cases for example binary coding, where data is represented as string. In such case, char * would not end with \0. Besides, there can be certain use-cases to read / write only up to certain length / size. In such case, it is always necessary to test whether the input length is within the range of length of total string. So as a common case, length has been passed explicitly, which can be used in any way as desired by the caller.

The importance of null character when initializing char arrays

I'm new with C++ and I started to wonder, what happens if you leave the null character out when defining a char array?
For example, if I define a char array with the null character:
char myarray[] = {'a', 'b', 'c', '\0'};
and then I define it without the null character:
char myarray[] = {'a', 'b', 'c'};
What is the importance of the null character in this scenario? Might the absence of null character in the example above cause some problems later on?...Do you recommend always including or excluding the null character when defining char arrays this way?
Thank you for any help :)

It means that anything that takes a char* as parameter, expecting it to be a null-terminated string, will invoke undefined behaviour, and fail in one way or another*.
Some examples are strlen, the std::string(const char*) constructor, the std::ostream operator<< specialization for char*...
* undefined behaviour means it could even work "correctly", but there is no guarantee this is reproducible.

char myarray[] = {'a', 'b', 'c'};
If you define with out nul character it is a valid character array not valid string.
1.You should not use this character array as an argument to the string functions like strlen(),strcpy(),etc..
2.You should not print this as we print string with %s in C.
3.You can print character by character.
4.You can compare character by character.
Further char myarray[] = {'a', 'b', 'c', '\0'}; is equal to "abc"
but char myarray[] = {'a', 'b', 'c'}; is not equal to "abc"

If you don't have the terminating null-character, you can still use your char array like you could with the null-char.
However, functions that expect null-terminated strings (like strlen), will not stop at the end, since they don't know where the end is. (That's what the null-char is for)
They will therefore continue to work in memory until they either go out of bounds and you get a segmentation fault or run until they find their null-char.
Basically, if you want your char array to be a string, append a null-char to denote the end.

what happens if you leave the null character out when defining a char array?
You get an array containing just the characters you specify.
What is the importance of the null character in this scenario?
In C, it's conventional to represent a string as a null-terminated character array. This convention is sometimes used in C++ to interoperate with C-style interfaces, or to work with string literals (which inherited their specification from C), or because the programmer thinks it's a good idea for some reason. If you're going to do this, then obviously you'll need to terminate all the arrays you want to interpret as strings.
The question seems to be about C++, although you've also tagged it C for some reason. In C++, you usually want to use std::string to manage strings for you. Life is too short for messing around with low-level arrays and pointers.
Might the absence of null character in the example above cause some problems later on?
If you pass a non-terminated array to a function expecting a terminated array, then it will stomp off the end of the array causing undefined behaviour.
Do you recommend always including or excluding the null character when defining char arrays this way?
I recommend understanding what the array is supposed to be used for, and include the terminator if it's supposed to be a C-style string.

What is the importance of the null character in this scenario?
High.
Might the absence of null character in the example above cause some problems later on?
Yes. C and C++ functions taking a char* that points to this C-string will require it to be null-terminated.
Do you recommend always including or excluding the null character when defining char arrays this way?
Neither. I recommend using std::string, since you said you are writing C++.

null character will be used by strlen like functions if you wish to use your array as a string. If I need a string because I want to use some text I write:
const char* mystr = "abc"; // it is already null terminated
writing:
char myarray[] = {'a', 'b', 'c', '\0'};
is to verbose

Why isn't strlen working for me?

char p[4]={'h','g','y'};
cout<<strlen(p);
This code prints 3.
char p[3]={'h','g','y'};
cout<<strlen(p);
This prints 8.
char p[]={'h','g','y'};
cout<<strlen(p);
This again prints 8.
Please help me as I can't figure out why three different values are printed by changing the size of the array.

strlen starts at the given pointer and advances until it reaches the character '\0'. If you don't have a '\0' in your array, it could be any number until a '\0' is reached.
Another way to reach the number you're looking for (in the case you've shown) is by using: int length = sizeof(p)/sizeof(*p);, which will give you the length of the array. However, that is not strictly the string length as defined by strlen.
As #John Dibling mentions, the reason that strlen gives the correct result on your first example is that you've allocated space for 4 characters, but only used 3; the remaining 1 character is automatically initialized to 0, which is exactly the '\0' character that strlen looks for.

Only your first example has a null terminated array of characters - the other two examples have no null termination, so you can't use strlen() on them in a well-defined manner.
char p[4]={'h','g','y'}; // p[3] is implicitly initialized to '\0'
char p[3]={'h','g','y'}; // no room in p[] for a '\0' terminator
char p[]={'h','g','y'}; // p[] implicitly sized to 3 - also no room for '\0'
Note that in the last case, if you used a string literal to initialize the array, you would get a null terminator:
char p[]= "hgy"; // p[] has 4 elements, last one is '\0'

That will get you a random number. strlen requires that strings be terminated with a '\0' to work.

try this:
char p[4]={'h','g','y', '\0'};

strlen is a standard library function that works with strings (in C sense of the term). String is defined as an array of char values that ends with a \0 value. If you supply something that is not a string to strlen, the behavior is undefined: the code might crash, the code might produce meaningless results etc.
In your examples only the first one supplies strlen with a string, which is why it works as expected. In the second and the third case, what you supply is not a string (not terminated with \0), which is why the results expectedly make no sense.

'\0' terminate your char buffer.
char p[4]={'h','g','y', '\0'};

This is because strlen() expects to find a null-terminator for the string. In this case, you don't have it, so strlen() keeps counting until it finds a \0 or gives a memory access violation and your program dies. RIP!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why do we need a null terminator in C++ strings? - c++

Related

declare char variable in c++, why need to add 1 for declare the array size [duplicate]

In C (also C++), how does the printf("%s",&st[0]) function knows how much it has to print, as we have passed only address of byte of string [duplicate]

why char* passed to FUNCTION always with the len of the string

The importance of null character when initializing char arrays

Why isn't strlen working for me?

Categories

Resources