How can you identify a string with a char pointer? - c++

I am learning c++ from a book, but I am already familiar with programming in c# and python. I understand how you can create a char pointer and increment the memory address to get further pieces of a string (chars), but how can you get the length of a string with just a pointer to the first char of it like in below code?
Any insights would help!
String::String(const char * const pString)
{
Length = strlen(pString)
}

This behavior is explained within the docs for std::strlen
std::size_t std::strlen(const char* str);
Returns the length of the given byte string, that is, the number of characters in a character array whose first element is pointed to by str up to and not including the first null character. The behavior is undefined if there is no null character in the character array pointed to by str.
So it will count the number of characters up to, but not including, the '\0' character. If this character is not encountered within the allocated bounds of the character array, the behavior is undefined, and likely in practice will result in reading outside the bounds of the array. Note that a string literal will implicitly contain a null termination character, in other words:
"hello" -> {'h', 'e', 'l', 'l', 'o', '\0'};

You can implement your own method in C-style like:
size_t get_length(const char * str) {
const char * tmp = str;
while(*str++) ;
return str - tmp - 1;
}

Related

What is difference of char l[] {'try'} and char l[] {'t', 'r', 'y'} in c++?

When I was trying to cout them
char l[] {'t', 'r', 'y'};
std::cout << l << std::endl;
I got try printed in the terminal.
However, when I tried this.
char l[] {'try'};
I only got y.
I came to c++ from python so lots of things don't make sense to me, so can you explain the difference of these two expressions?
None of your examples are valid.
The first one has undefined behavior.
char l[] {'t', 'r', 'y'};
std::cout << l << std::endl;
Here you define l to be a char[3]. That's fine. Printing it like you do is however not since the operator<< overload you use requires a null terminated string. Your char[3] is not null terminated so the operator<< overload will try to find the null terminator out of bounds which causes undefined behavior. A working variant would have been
char l[] {'t', 'r', 'y', '\0'};
which would have made it a char[4] with a \0 (null terminator) at the end.
The second example tries to create the array from a multibyte character. Single quotes are for individual characters so that's why that interpretation doesn't work. Many compilers will simply refuse to compile it. To create the array from a string literal, use double quotes:
char l[] {"try"};
or the idiomatic way:
char l[] = "try";
Both versions will create a null terminated array, just like char l[] {'t', 'r', 'y', '\0'};
char l[] {'t', 'r', 'y'}; defines l to be array of three characters with no terminating null character, so using it in std::cout << l is bad because the insert operator << needs the terminating null character to tell it where the end is.
char l[] {'try'}; defines l to be an array of one character because 'try' is a single number formed from three characters in an implementation-defined way. Since it is only one number, the size of the array l is set to be one element. To initialize l with that number, it is converted to char, which loses information about the three characters.
char l[] {"try"}; would define l to be an array of four characters, because string literals (marked by " instead of ') automatically include a terminating null character. Also, even though a string literal is β€œone thing,” there is a special rule that says when a string literal is used to initialize an array, its contents are used to initialize the array. So the array size is four elements because there are four characters in the string.
In this declaration
char l[] {'t', 'r', 'y'};
there is declared a character array that contains exactly three characters. As the array does not contain a string (a sequence of characters terminated with the zero or the so-called null character '\0') then the next statement
std::cout << l << std::endl;
invokes undefined behavior because in this case the operator << expects a pointer to a string (an array designator with rare exceptions is converted to pointer to its first element).
Instead you could write for example
std::cout.write( l, sizeof( l ) ) << std::endl;
Otherwise you could initialize the array by a string literal like
char l[] {"try"};
and write
std::cout << l << std::endl;
In this declaration
char l[] {'try'};
there is declared a character array that is initialized by the multicharacter literal 'try' that has an implementation defined value and the type int. Multicharacter literals are conditionally supported.
The compiler should issue a message for such a declaration that there is used a narrowing conversion from int to char.
Again you may not write
std::cout << l << std::endl;
because the array does not contain a string. It contains only one element with an implementation defined value.
Try this code snippet
char l[]{ 'try' };
std::cout << sizeof( l ) << '\n';
To output the array you could write as already shown above
std::cout.write( l, sizeof( l ) ) << std::endl;
When you insert an array into an output stream, the array argument implicitly converts to a pointer to char. When you pass a pointer to char into an output stream, it must point to a null terminated string of characters. If the pointer is not to a null terminated character string, then the behaviour of the program is undefined. Undefined behaviour should be avoided.
char l[] {'t', 'r', 'y'};
std::cout << l << std::endl;
l is an array of 3 characters that does not contain a null terminator character. In this example you insert a pointer to non-null terminated string into an output stream and the behaviour of the program is undefined. The program is broken. Don't do this.
char l[] {'try'};
l is an array of 1 character that does not contain a null terminator character. If you insert this array to an output stream, then the behaviour of the program is undefined. The program is broken. Don't do this.
Note: 'try' is a multicharacter literal. It's type is int and its value is implementation defined. Multicharacter literals aren't useful very often. What you probably intended to do, is to use a string literal:
char l[] = "try";
The string literal "try" is an array of 4 characters; ending in the null terminator character.

Get away with Initialize the char array without putting \0 at the end of string

I am new to c++ language,recently, as I was taught that:
we should put '\0' at the end of char array while doing initialization ,for example :
char x[6] = "hello"; //OK
However,if you do :
char x[5] = "hello";
Then this would raise the error :
initializer-string for array of chars is too long
Everything goes as I expect until the experssion below does not raise the compile error...:
char x[5] = {'h','e','l','l','o'};
This really confuses me , So I would like to ask two questions :
1.Why doesn't expression char x[5] = "hello"; raise error?
2.To my knowledge,the function strlen() would stop only if it finds '\0' to determine the lengh of char array,in this case,what would strlen(x) return?
Thanks!
The string literal "hello" has six characters, because there's an implied nul terminator. So
char x[] = "hello";
defines an array of six char. That's almost always what you want, because the C-style string functions (strlen, strcpy, strcat, etc.) operate on C-style strings, which are, by definition, nul terminated.
But that doesn't mean that every array of char will be nul terminated.
char x[] = { 'h', 'e', 'l', 'l', 'o' };
This defines an array of five char. Applying C-style string functions to this array will result in undefined behavior, because the array does not have a nul terminator.
You can do character-by-character initialization and create a valid C-style string by explicitly including the nul terminator:
char x[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This defines an array of six char that holds a C-style string (i.e., a nul terminated sequence of characters).
The key here is to separate in your mind the general notion of an array of char from the more specific notion of an array of char that holds a C-style string. The latter is almost always what you want to do, but that doesn't mean that there is never a use for the former. It's just that the former is uncommon.
As an aside, in C you're allowed to elide the nul terminator:
char x[5] = "hello";
this is legal C, and it creates an array of 5 char, with no nul terminator. In C++ that's not legal.
Why doesn't expression char x[5] = "hello"; raise an error?
This is not true. The appearance of an error is expected in this case.
To my knowledge, the function strlen() would stop only if it finds '\0' to determine the length of the char array, in this case, what would strlen(x) return?
If you can run the code somehow, the program will undergo an undefined-behavior. That is, you will not get what you would expect. The strlen() will only stop counting when it finds a null-terminator, i.e. it may go outside the initialized part of the char array and access the uninitialized ones – it's where the UB is invoked.

Is the "NUL" character ("\0") byte separated or is it assigned to that char-array with the string?

If i define a string:
char array[5] = {"hello"};
Is the NUL character (\0) byte "hidden" added to "array[5]", so that the array is not contained of 5 bytes in memory, it is contained of 6 bytes?
OR does the NUL character byte lie "separated" from "array[5]" in memory only after the last element of the char-array, but not directly assigned to "array[5]"?
If i would put this:
i = strlen(array);
printf("The Amount of bytes preserved for array: %d",i);
What would be the result for the amount of bytes preserved for array[5]?
Does the "NUL" character ("\0") byte lie separated after the last element of char-array in the memory or is it assigned to that char-array?
Does the "NUL" character ("\0") byte lie separated after the last element of char-array in the memory or is it assigned to that char-array?
No. Neither answer is correct. See below for details.
Answer for C:
If you write your code like that, with an explicit size that is too small for the terminator, array will have exactly 5 elements and there will be no NUL character.
strlen(array) has undefined behavior because array is not a string (it has no terminator). char array[5] = {"hello"}; is equivalent to char array[5] = {'h', 'e', 'l', 'l', 'o'};.
On the other hand, if you write
char array[] = "hello";
it is equivalent to
char array[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
and strlen will report 5.
The relevant part of the C standard is:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
(Emphasis mine.)
Answer for C++:
Your code is invalid. [dcl.init.string] states:
There shall not be more initializers than there are array elements. [ Example:
char cv[4] = "asdf"; // error
is ill-formed since there is no space for the implied trailing '\0'. β€” end example ]
In C++, char array[5] = {"hello"} is of six bytes. But you have assigned five bytes only. Therefore, the array declaration is incorrect.
Alternatively, this works: char array[6] = {"hello"}.

Why can I assign a string literal whose length is less than the array itself?

I'm a bit baffled that this is allowed:
char num[6] = "a";
What is happening here? Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
Why can I assign a string literal less than the array itself? What is happening here?
This is well defined. When initialize character arrays with string literal,
If the size of the array is specified and it is larger than the number
of characters in the string literal, the remaining characters are
zero-initialized.
So,
char num[6] = "a";
// equivalent to char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Am I assigning a pointer to the array or copying the literal values into the array (and therefore I'm able to modify them later)?
The value will be copied, i.e. the elements of the array will be initialized by the chars of the string literal (including '\0').
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
Successive characters of the string literal (which includes the implicit terminating null character) initialize the elements of the array.
char num[6] = "a";
is equivalent to
char num[6] = {'a', '\0', '\0', '\0', '\0', '\0'};
Why can I assign a string literal less than the array itself?
This is allowed by the language. It is often useful to be able to add more characters to the array later, which wouldn't be possible if the existing characters filled the entire array.
Am I assigning a pointer to the array
No. You cannot assign a pointer to an array, so that is not happening.
or copying the literal values into the array
That is exactly what is happening.
and therefore I'm able to modify them later
You are able to modify the array, indeed.
Just use char num[6] = {"a"};. It works.
This kind of declaration is a special syntax sugar thing. It's equivalent to
char num[6] = {'a', 0}
The array is always modifiable. Its contents after such a declaration would be a character representing 'a', a zero (NUL terminator) and the remainder of the array will also be zeroed (zero initialization).
That is one type of declaration whcih is equivalent to
char num[6] = {'a','\0'};
You declared c-string with length of max. 5 normal chars, at the end must me \0 to end c - string.
With declaration you can use
char num[6] = "a";
then you need to assign value:
With strcpy(dest,src)
strcpy(num,"test");
Char by char
num[0]='t';
num[1]='e';
num[2]='s';
num[3]='t';
num[4]='\0';

Difference in the initialisation of character array

I do the following for character array intialisation :
char a[] = "teststring";
char b[]={'a','a','b','b','a'};
While for the first, if I need to get the string length, I must do strlen(a) ....for the other string I should do sizeof(b)/sizeof(b[0]).
why this difference?
EDIT : (I got this)
char name[10]="StudyTonight"; //valid character array initialization
char name[10]={'L','e','s','s','o','n','s','\0'}; //valid initialization
Remember that when you initialize a character array by listings all its characters separately then you must supply the '\0' character explicitly.
I get that with char b we have to add '\0' to make a proper initialisation.
ANOTHER :
Therefore, the array of char elements called myword can be initialized with a null-terminated sequence of characters by either one of these two statements:
char myword[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char myword[] = "Hello";
A string literal, like "teststring" contains the characters between the double-quotes, plus a terminating char with value zero. So
char a[] = "ab";
has the same effect as;
char a[] = {'a', 'b', '\0'};
strlen() searches for that character with value '\0'. So strlen(a) in this case will return 2.
Conversely, sizeof() gets the actual size of the memory used. Since sizeof(char) is 1, by definition in the standard, this means sizeof(a) give the value of 3 - it counts the 'a', the 'b', and the '\0'.
a is a C-style string, i.e, null-terminated char array. The initialization is equivalent to:
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
b, however, is not null-terminated, so it's not a C-style string, you can't use functions like std::strlen() because they are only valid for C-style strings.
String literals are expanded into char arrays, but also include the terminating zero char. So think the
char a[] = "teststring";
as if you have types this
char a[] = {'t','e','s','t','s','t','r','i','n','g','\0'};
A rule of thumb
Whenever you will use strlen() on a char array, use string literals its for initialization. The strlen function can be thought as a simple scan the terminating zero char (\0) counting the iterations needed.
A word about sizeof
Even if sometimes used with parentheses, sizeof is an operator, an integral part of the C++ language (inherited from C times). In cases like char c[] = "hello";, sizeof(c) will return 6, which is exactly 1 more than strlen(c), and you might be thinking: "lets skip that inefficient scanning for the terminator", but sizeof stops to be such "efficient" as soon as it works on pointers, and arrays can (and will) be used as pointers whenever required. Look at the following example:
#include <iostream>
// naive approach, don't do that
int myarraysize(char s[])
{
return sizeof(s);
}
int main ()
{
char c[] = "hello";
std::cout << sizeof(c) << " vs " << myarraysize(c) << std::endl;
return 0;
}
online demo
You can always write
char b[]={'a','a','b','b','a','\0'};
to overcome "the difference".
Also note
sizeof(b)/sizeof(b[0])
essentially boils down to
sizeof(b)
since sizeof(char) is always 1. Your formula is used for any other array element types.