Why does strlen() apply on character arrays also? - c++

strlen() is a function argument should be a string but why is it also applicable to character arays ?
For eg
char abc[100];
cin.getline(abc,100);
len=strlen(abc);
If it works for character array and tells the number of elements , can it be used for int array also?
Note : I am using TurboC++

It's important to understand what a string is. The C standard defines a string as "a contiguous sequence of characters terminated by and including the first null character". It's a data format, not a data type; a C-style string may be contained in an array of char. This is not to be confused with the C++-specific type std::string, defined in the <string> (note: no .h suffix) header.
The <string.h> header, or preferably the <cstring> header, is incorporated into C++ from the C standard library. The functions declared in that header operate on C-style strings, or on pointers to them.
The argument to strlen is of type char*, a pointer to a character. (It's actually const char*, meaning that strlen promises not to modify whatever it points to.)
An array expression is, in most contexts, implicitly converted to a pointer to the initial element of the array. (See section 6 of the comp.lang.c FAQ for the details.)
The char* argument that you pass to strlen must point to the initial element of an array of characters, and there must be a null character ('\0') somewhere in the array to mark the end of the string. It computes the number of characters up to, but not including, the null terminator.
It does not (and cannot) compute the number of elements in an array, only the number of characters in a string -- which it can do only if the array actually contains a valid string. If there is no null character anywhere in the array, or if the pointer is null or otherwise invalid, the behavior is undefined.
So when you write:
char abc[100];
cin.getline(abc,100);
len=strlen(abc);
the call to cin.getline ensures that the array abc contains a properly null-terminated string. strlen(abc) calls strlen, passing it the address of the initial character; it's equivalent to strlen(&abc[0]).
No, strlen will not work on an array of int. For one thing, that would pass an int* value, which doesn't match the char* that strlen requires, so it probably wouldn't compile. Even ignoring that, strlen counts characters, not ints. (You can write your own similar function that counts ints if you like, but it still has to have some way to find the end of the elements that you're interested in counting. It doesn't have access to the actual length of the array unless you pass it explicitly.)

strlen only works for null-terminated char arrays.
If there's no null character at the end of the string1, calling strlen causes undefined behavior.
std::cin.getline automatically appends the null character at the end so that's why your strlen worked.
1 (which is not necessarily the same as the end of the array)

In C strings are arrays of char. All strlen does is count the number of items until a 0 (or null character) is found

Related

How to get the size of an array using a pointer to the first element and properties of "\0"? [duplicate]

This question already has answers here:
getting size of array from pointer c++
(6 answers)
Closed last year.
Given a pointer to the first element of an array, how can I get the size of the array? For example, if I were given a but not arr here:
int arr[]={1,2,3};
int* a=arr;
Also could anyone explain a bit about the properties of "\0" in the array?
My understanding of "\0" is that it marks the end of the array. Is that correct? And is there more I should pay attention to?
Does "\0" work for all types of arrays or only char type? If it works only for char type, are there any similar symbols for other types of arrays?
Will '\0' be automatically inserted at the end of the array? Or it will be automatically inserted only under some specific conditions?
Given:
int arr[]={1,2,3};
int* a=arr;
you cannot get the size of the array given just the value of a. The implicit conversion from int[3] to int* is lossy; it discards any information about the size of the array, and yields a pointer to the array's initial element.
The null character '\0' marks the end of a C-style string (a null-terminated character string) and can be used to determine the length of the string -- not the size of the array that contains it.
There is no such convention for arrays of non-character types. Of course you can define such a convention in your own code, but you're going to have problems if 0 is a valid element type.
If you need to do this kind of thing, you're better off using C++'s standard library. For example, the type std::vector<int> hold a sequence of int values and keep track of its length for you. If you really need to write C-style code that deals with arrays via pointers to their elements, you'll need to track the array length yourself.
String literals do automatically provide a terminating null character, but again that applies only to strings, not to arrays in general.
I wonder given a pointer to the first element of an array, how can I get the size of the array? eg. int arr[]={1,2,3};int* a=arr, given a(not arr), how to get the size of the array?
You generally cannot get the size using only the pointer.
1.my understanding of "\0" is that it marks the end of the array, is that correct?
You can choose any value to represent the end of an array.
"\0" isn't typically used to represent the end of an array.
'\0' i.e. the null character is often used to represent the end of a character string. Such string is called "null terminated string".
Does "\0" work for all types of arrays or only char type?
"\0" is a string literal. Strings are arrays of chars. An element of an array of chars is not an array of chars, but a char.
Will '\0' be automatically inserted at the end of the array?
No, '\0' will not be inserted at the end of the array. Not all arrays even contain elements of a type that a char could be converted into. However, there is an implicit '\0' at the end of every string literal.

Why does casting a string (array) to char result in a lower case "c"

I can't find where it's documented that casting a string to char results in a "c" character:
Serial.println(char(67)); // => C
Serial.println(char(81)); // => Q
Serial.println(char('C')); // => C
Serial.println(char('Q')); // => Q
Serial.println(char("C")); // => c
Serial.println(char("Q")); // => c
"Q" actually is a character array of two characters: { 'Q', '\0' } (null-terminated C-string of length 1), residing at some specific address in memory.
Arrays, explicitly defined or coming from string literals doesn't matter, decay to pointers automatically in most contexts, e. g. when passing as function arguments, dereferencing them, … – and especially, too, when applying a cast to them!
So what happens here is actually equivalent to
char const* ptr = "Q";
char(ptr);
Actually, this is undefined behaviour, as char, be it signed or not, is not large enough to hold a pointer value, so anything could happen. Under the hoods, the code will most likely be treated as if you (fully legally) did:
char(unsigned char(uintptr_t(ptr)))
simply cutting off the most significant three bytes.
What remains is the least significant byte of the memory address, and it's just pure accident that it matches 99, the ASCII value of c, it could have been any value else.
A string literal such as "C" is an array of characters. char("C") converts the array into a character. In value contexts such as this, an array implicitly decays into pointer to first element. Therefore this is a conversion from pointer to an integer type (character types are integer types).
There exists no conversion from pointer to integer types except to such integer types that can represent all pointer values. As such char("C") is an ill-formed expression on any system where char cannot represent all pointer values.
A compiler that does not diagnose the error does not conform to the standard. If a compiler successfully compiles an ill-formed program, it's completely up to the compiler how it should behave; it is out of the scope of the language standard.

Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers? [duplicate]

This question already has answers here:
How can a char pointer be initialized with a string (Array of characters) but an int pointer not with an array of integer? [duplicate]
(4 answers)
Closed 5 years ago.
I am trying to understand the relationship between strings, arrays, and pointers.
The book I am reading has a program in which it initializes a variable as follows:
char* szString= "Name";
The way I understand this, is that a C-style string is simply an array of chars. An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset. I.e.
array[5] in fact returns what is evaluated from expression *(array+5).
So, from my understanding and testing the szString is in fact initialized as a pointer which points to the first address of the array storing "Name". I can deduce this because the output to:
cout << *szstring;
is the character "N".
My understanding of the statement
cout << szstring;
outputting the characters "Name", is that the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character. On the other hand for argument *szstring a different version of this method is used that supports C-style strings.
Therefore, if I can initialize a char type pointer to address the first element in an array of chars (a C-style string), why can I not initialize an int type pointer to the first element in an array of integers as follows:
int* intArray = {1,2,3};
a C-style string is simply an array of chars
Correct.
An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset.
No, not really.
the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character
cout is not a "method", but its operator<< works this way yes.
Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers?
The simple answer is that string literals are special, otherwise we would not be able to use them.
In many ways, including this way, the language standards dictate special handling for both string literals and char*s.
why can I not initialize an int type pointer to the first element in an array of integers
C++ could have ultimately extended the syntax of other pointer initialisations to do a similar thing, but it didn't actually need to because instead we have the far superior:
std::vector<int> myInts{1,2,3};
The short answer is that there exist character array literals, but no int array literals.
A string literal is a literal value of array type, and it is an lvalue, so that's something whose address you can take and store. The lifetime of the object designated by such a value is permanent, so pointers thus obtained are valid throughout the entire program.
By contrast, there is no literal of type "array of int", and no unnamed int array lvalues.
Don't confuse this with the braced initialization lists, which are not expressions and therefore not values! Braced lists can be used to initialize variables of array type, but they are not themselves values.
If anything, the only odd-man-out in the language grammar is that it is permissible to initialize a char array with a braced list containing a string literal: char a[] = {"foo"}; Think of this as a kind of copy initialization; a is a copy of the literal lvalue.
As a beginner I had a similar question. Please look at this post and the answers.
This const char* szString= "Name" assigns to the pointer szString the address of the initial element of an array whose contents are "Name" (followed by a terminating '\0' null character).
There's no implicit conversion from int to int*, other that 0 being a special case, as a null pointer.

Are C strings guaranteed to be arrays?

Are C strings (as opposed to std::string) guaranteed to be implemented as arrays? For example, say, I have
char const * str = "abc";
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is). I'm asking this because I dont know if C strings are a special case due to the null character terminating it.
First part of the question
Are C strings guaranteed to be implemented as arrays?
For example, say, I have: char const * str = "abc"
Yes, a string object is of an array type. A character string is a data format and a (character) string object is of a type array of char.
In your example str points to the string literal "abc". Character string literals have the type char[N+1] where N is the length of the string (i.e., the number of characters excluding the terminating null character).
Some references from Standard and K&R 2nd edition:
C defines a string literal as:
(C99, 6.4.5p2) "A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz"."
and says (emphasis mine):
C99, 6.4.5p5) "For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;"
K&R 2nd edition says:
"Technically, a string constant is an array of characters"
and
"when a string constant like "hello\n" appears in a C program, it is stored as an array of characters containing the characters in the string and terminated with a '\0' to mark the end."
Second part of the question
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is).
Yes, it is a valid pointer. In your case str + 4 is a pointer one past the last element of the array.
A valid pointer is a pointer that is either a null pointer or a pointer to a valid object. For an element of an array object, a pointer one past the last element of the array object is also a valid pointer.
Note that for the purpose of the last rule ("the one past element"), for pointers to objects that are not elements of an array, C treats the object as an array of one element.
(C99, 6.5.6p7) "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."
They are guaranteed to be a contiguous sequence of chars. If that's your definition of an array, then yes.
In your example you will have 4 chars, one for each character and one for the null terminator. str+4 will be out of range.
Are C strings guaranteed to be implemented as arrays?
With a wide definition of array, yes, they are a contiguous sequence of chars with a terminating null character.
What it boils down to is whether or not str + 4 a legal pointer value
The literal ("abc") is an array stored somewhere in the process memory. The type is is const char[4] (in C++, I am not sure if in C it is char[4]). Then str is a pointer to the first element of the string literal, and the expression str+3 is correct, can be dereferenced and the pointed character will be 0. The expression str+4 is a pointer beyond the end of the array and cannot be dereferenced.
The short answer is: yes, they are, but str+4 isn't necessarily a legal pointer as 1 char may not be equal to 1 byte.

Is strncpy() a specialization of memcpy()?

Just curious to know (as we use these functions often). I don't see any practical difference between strncpy() and memcpy(). Isn't it worth to say that effectively,
char* strncpy (char *dst, const char *src, size_t size)
{
return (char*)memcpy(dst, src, size);
}
Or am I missing any side effect? There is one similar earlier question, but couldn't find an exact answer.
There is a difference, see this part of the strncpy page you linked to (emphasis mine):
Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.
So if the string to be copied is shorter than the limit, strncpy pads with zero while memcpy reads beyond the limit (possibly invoking undefined behaviour).
No, they are not the same.
From the C Standard (ISO/IEC 9899:1999 (E))
7.21.2.3 The strcpy function
Description
2 The strncpy function copies not more than n characters (characters that follow a null
character are not copied) from the array pointed to by s2 to the array pointed to by s1.260) If copying takes place between objects that overlap, the behavior is undefined.
3 If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written.
Returns
4 The strncpy function returns the value of s1.
7.21.2.1 The memcpy function
Description
2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
Returns
3 The memcpy function returns the value of s1.
when using memcpy() the source and destination buffers can overlap, while in strncpy() this must not happen.
According to the C standard, the behavior for overlapping buffers are undefined for both strncpy() and memcpy().
According to the C standard, the real difference between strncpy() and memcpy() is that if the source string is less then N value, then NULL characters are appended to the remaining N quantity.
memcpy() is more efficient, but less safe, since it doesn't check the source to see if it has N quantity to move to the target buffer.
No, strncpy() is not a specialization, since it will detect a '\0' character during the copy and stop, something memcpy() will not do.
Adding on to what the others have said, the type of the src and dst pointers does not matter. That is, I can copy a 4 byte integer to 4 consecutive characters of 1 byte like this:
int num = 5;
char arr[4];
memcpy(arr, &num, 4);
Another difference is that memcpy does not look ofr any characters (such as NULL, by strncpy). It blindly copies num bytes from source to destination.
Edited: Properly formatted the code
You could potentially make strncpy faster by checking for a \0 and not copying past that point. So memcpy would always copy all the data, but strncpy would often be faster because of the check.