Are C strings guaranteed to be arrays? - c++

Are C strings (as opposed to std::string) guaranteed to be implemented as arrays? For example, say, I have
char const * str = "abc";
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is). I'm asking this because I dont know if C strings are a special case due to the null character terminating it.

First part of the question
Are C strings guaranteed to be implemented as arrays?
For example, say, I have: char const * str = "abc"
Yes, a string object is of an array type. A character string is a data format and a (character) string object is of a type array of char.
In your example str points to the string literal "abc". Character string literals have the type char[N+1] where N is the length of the string (i.e., the number of characters excluding the terminating null character).
Some references from Standard and K&R 2nd edition:
C defines a string literal as:
(C99, 6.4.5p2) "A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz"."
and says (emphasis mine):
C99, 6.4.5p5) "For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;"
K&R 2nd edition says:
"Technically, a string constant is an array of characters"
and
"when a string constant like "hello\n" appears in a C program, it is stored as an array of characters containing the characters in the string and terminated with a '\0' to mark the end."
Second part of the question
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is).
Yes, it is a valid pointer. In your case str + 4 is a pointer one past the last element of the array.
A valid pointer is a pointer that is either a null pointer or a pointer to a valid object. For an element of an array object, a pointer one past the last element of the array object is also a valid pointer.
Note that for the purpose of the last rule ("the one past element"), for pointers to objects that are not elements of an array, C treats the object as an array of one element.
(C99, 6.5.6p7) "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."

They are guaranteed to be a contiguous sequence of chars. If that's your definition of an array, then yes.
In your example you will have 4 chars, one for each character and one for the null terminator. str+4 will be out of range.

Are C strings guaranteed to be implemented as arrays?
With a wide definition of array, yes, they are a contiguous sequence of chars with a terminating null character.
What it boils down to is whether or not str + 4 a legal pointer value
The literal ("abc") is an array stored somewhere in the process memory. The type is is const char[4] (in C++, I am not sure if in C it is char[4]). Then str is a pointer to the first element of the string literal, and the expression str+3 is correct, can be dereferenced and the pointed character will be 0. The expression str+4 is a pointer beyond the end of the array and cannot be dereferenced.

The short answer is: yes, they are, but str+4 isn't necessarily a legal pointer as 1 char may not be equal to 1 byte.

Related

How to get the size of an array using a pointer to the first element and properties of "\0"? [duplicate]

This question already has answers here:
getting size of array from pointer c++
(6 answers)
Closed last year.
Given a pointer to the first element of an array, how can I get the size of the array? For example, if I were given a but not arr here:
int arr[]={1,2,3};
int* a=arr;
Also could anyone explain a bit about the properties of "\0" in the array?
My understanding of "\0" is that it marks the end of the array. Is that correct? And is there more I should pay attention to?
Does "\0" work for all types of arrays or only char type? If it works only for char type, are there any similar symbols for other types of arrays?
Will '\0' be automatically inserted at the end of the array? Or it will be automatically inserted only under some specific conditions?
Given:
int arr[]={1,2,3};
int* a=arr;
you cannot get the size of the array given just the value of a. The implicit conversion from int[3] to int* is lossy; it discards any information about the size of the array, and yields a pointer to the array's initial element.
The null character '\0' marks the end of a C-style string (a null-terminated character string) and can be used to determine the length of the string -- not the size of the array that contains it.
There is no such convention for arrays of non-character types. Of course you can define such a convention in your own code, but you're going to have problems if 0 is a valid element type.
If you need to do this kind of thing, you're better off using C++'s standard library. For example, the type std::vector<int> hold a sequence of int values and keep track of its length for you. If you really need to write C-style code that deals with arrays via pointers to their elements, you'll need to track the array length yourself.
String literals do automatically provide a terminating null character, but again that applies only to strings, not to arrays in general.
I wonder given a pointer to the first element of an array, how can I get the size of the array? eg. int arr[]={1,2,3};int* a=arr, given a(not arr), how to get the size of the array?
You generally cannot get the size using only the pointer.
1.my understanding of "\0" is that it marks the end of the array, is that correct?
You can choose any value to represent the end of an array.
"\0" isn't typically used to represent the end of an array.
'\0' i.e. the null character is often used to represent the end of a character string. Such string is called "null terminated string".
Does "\0" work for all types of arrays or only char type?
"\0" is a string literal. Strings are arrays of chars. An element of an array of chars is not an array of chars, but a char.
Will '\0' be automatically inserted at the end of the array?
No, '\0' will not be inserted at the end of the array. Not all arrays even contain elements of a type that a char could be converted into. However, there is an implicit '\0' at the end of every string literal.

Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers? [duplicate]

This question already has answers here:
How can a char pointer be initialized with a string (Array of characters) but an int pointer not with an array of integer? [duplicate]
(4 answers)
Closed 5 years ago.
I am trying to understand the relationship between strings, arrays, and pointers.
The book I am reading has a program in which it initializes a variable as follows:
char* szString= "Name";
The way I understand this, is that a C-style string is simply an array of chars. An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset. I.e.
array[5] in fact returns what is evaluated from expression *(array+5).
So, from my understanding and testing the szString is in fact initialized as a pointer which points to the first address of the array storing "Name". I can deduce this because the output to:
cout << *szstring;
is the character "N".
My understanding of the statement
cout << szstring;
outputting the characters "Name", is that the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character. On the other hand for argument *szstring a different version of this method is used that supports C-style strings.
Therefore, if I can initialize a char type pointer to address the first element in an array of chars (a C-style string), why can I not initialize an int type pointer to the first element in an array of integers as follows:
int* intArray = {1,2,3};
a C-style string is simply an array of chars
Correct.
An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset.
No, not really.
the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character
cout is not a "method", but its operator<< works this way yes.
Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers?
The simple answer is that string literals are special, otherwise we would not be able to use them.
In many ways, including this way, the language standards dictate special handling for both string literals and char*s.
why can I not initialize an int type pointer to the first element in an array of integers
C++ could have ultimately extended the syntax of other pointer initialisations to do a similar thing, but it didn't actually need to because instead we have the far superior:
std::vector<int> myInts{1,2,3};
The short answer is that there exist character array literals, but no int array literals.
A string literal is a literal value of array type, and it is an lvalue, so that's something whose address you can take and store. The lifetime of the object designated by such a value is permanent, so pointers thus obtained are valid throughout the entire program.
By contrast, there is no literal of type "array of int", and no unnamed int array lvalues.
Don't confuse this with the braced initialization lists, which are not expressions and therefore not values! Braced lists can be used to initialize variables of array type, but they are not themselves values.
If anything, the only odd-man-out in the language grammar is that it is permissible to initialize a char array with a braced list containing a string literal: char a[] = {"foo"}; Think of this as a kind of copy initialization; a is a copy of the literal lvalue.
As a beginner I had a similar question. Please look at this post and the answers.
This const char* szString= "Name" assigns to the pointer szString the address of the initial element of an array whose contents are "Name" (followed by a terminating '\0' null character).
There's no implicit conversion from int to int*, other that 0 being a special case, as a null pointer.

Why does strlen() apply on character arrays also?

strlen() is a function argument should be a string but why is it also applicable to character arays ?
For eg
char abc[100];
cin.getline(abc,100);
len=strlen(abc);
If it works for character array and tells the number of elements , can it be used for int array also?
Note : I am using TurboC++
It's important to understand what a string is. The C standard defines a string as "a contiguous sequence of characters terminated by and including the first null character". It's a data format, not a data type; a C-style string may be contained in an array of char. This is not to be confused with the C++-specific type std::string, defined in the <string> (note: no .h suffix) header.
The <string.h> header, or preferably the <cstring> header, is incorporated into C++ from the C standard library. The functions declared in that header operate on C-style strings, or on pointers to them.
The argument to strlen is of type char*, a pointer to a character. (It's actually const char*, meaning that strlen promises not to modify whatever it points to.)
An array expression is, in most contexts, implicitly converted to a pointer to the initial element of the array. (See section 6 of the comp.lang.c FAQ for the details.)
The char* argument that you pass to strlen must point to the initial element of an array of characters, and there must be a null character ('\0') somewhere in the array to mark the end of the string. It computes the number of characters up to, but not including, the null terminator.
It does not (and cannot) compute the number of elements in an array, only the number of characters in a string -- which it can do only if the array actually contains a valid string. If there is no null character anywhere in the array, or if the pointer is null or otherwise invalid, the behavior is undefined.
So when you write:
char abc[100];
cin.getline(abc,100);
len=strlen(abc);
the call to cin.getline ensures that the array abc contains a properly null-terminated string. strlen(abc) calls strlen, passing it the address of the initial character; it's equivalent to strlen(&abc[0]).
No, strlen will not work on an array of int. For one thing, that would pass an int* value, which doesn't match the char* that strlen requires, so it probably wouldn't compile. Even ignoring that, strlen counts characters, not ints. (You can write your own similar function that counts ints if you like, but it still has to have some way to find the end of the elements that you're interested in counting. It doesn't have access to the actual length of the array unless you pass it explicitly.)
strlen only works for null-terminated char arrays.
If there's no null character at the end of the string1, calling strlen causes undefined behavior.
std::cin.getline automatically appends the null character at the end so that's why your strlen worked.
1 (which is not necessarily the same as the end of the array)
In C strings are arrays of char. All strlen does is count the number of items until a 0 (or null character) is found

char* and char arr[] Difference - C++/C [duplicate]

This question already has answers here:
C: differences between char pointer and array [duplicate]
(14 answers)
What is the difference between char array and char pointer in C?
(8 answers)
Closed 9 years ago.
Just starting out in C++, I was wondering if someone could explain something.
I believe you can initialise a char array in the following way
char arr[] = "Hello"
This will create a Char array with the values 'H', 'e', 'l', 'l', 'o', '\0'.
But if I do create this:
char* cp = "Hello";
Will that create an array, and the pointer to that array?
Eg: cp will point to the first element ('H') in memory, with the additional elements of the array?
The string literal itself has array type. So in the first example you gave, there are actually two arrays involved. The first is the array containing the string literal and the second is the array arr that you're declaring. The characters from the string literal are copied into arr. The C++11 wording is:
A char array (whether plain char, signed char, or unsigned char), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow character literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces. Successive characters of the value of the string literal initialize the elements of the array.
In the second example, you are letting the string literal array undergo array-to-pointer conversion to get a pointer to its first element. So your pointer is pointing at the first element of the string literal array.
However, note that your second example uses a feature that is deprecated in C++03 and removed in C++11 allowing a cast from a string literal to a char*. For valid C++11, it would have to instead be:
const char* cp = "Hello";
If do use the conversion to char* in C++03 or in C, you must make sure you don't attempt to modify the characters, otherwise you'll have undefined behaviour.
An array is basically a constant pointer, which points to the beginning of an array. A pointer is just a pointer, which points to any memory location. So given the pointer p, p[3] would point to p+3, which would give a segmentation fault, unless you had declared it as an "array" with at least 4 elements(int *p = new int[4];). This is exactly the same for int p[4];, except the fact that p is now a const int *.

What is the difference between char a[] = ?string?; and char *p = ?string?;?

As the heading says, What is the difference between
char a[] = ?string?; and
char *p = ?string?;
This question was asked to me in interview.
I even dont understand the statement.
char a[] = ?string?
Here what is ? operator? Is it a part of a string or it has some specific meaning?
The ? seems to be a typo, it is not semantically valid. So the answer assumes the ? is a typo and explains what probably the interviewer actually meant to ask.
Both are distinctly different, for a start:
The first creates a pointer.
The second creates an array.
Read on for more detailed explanation:
The Array version:
char a[] = "string";
Creates an array that is large enough to hold the string literal "string", including its NULL terminator. The array string is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.
The pointer version:
char *p = "string";
Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates[Ref 1] use of string literal without the const keyword. So the declaration should be:
const char *p = "string";
Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.
Which version is better and which one shall I use?
Depends on the Usage.
If you do not need to make any changes to the string, use the pointer version.
If you intend to change the data, use the array version.
Note: This is a not C++ but this is C specific.
Note that, use of string literal without the const keyword is perfectly valid in C.
However, modifying a string literal is still an Undefined Behavior in C[Ref 2].
This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
The first one is array the other is pointer.
The array declaration char a[6]; requests that space for six characters be set aside, to be known by the name a. That is, there is a location named a at which six characters can sit. The pointer declaration char *p; on the other hand, requests a place which holds a pointer. The pointer is to be known by the name p, and can point to any char (or contiguous array of chars) anywhere.
The statements
char a[] = "string";
char *p = "string";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+----+
a: | s | t | r | i | n | g | \0 |
+---+---+---+---+---+---+----+
+-----+ +---+---+---+---+---+---+---+
p: | *======> | s | t | r | i | n | g |\0 |
+-----+ +---+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location a, move three elements past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character l, but the compiler gets there differently.
Source: comp.lang.c FAQ list · Question 6.2
char a[] = "string";
This allocates the string on the stack.
char *p = "string";
This creates a pointer on the stack that points to the literal in the data segment of the process.
? is whoever wrote it not knowing what they were doing.
Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc and calloc will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.
In these 4 segements, text segment is the READ ONLY segment and in the all the other three is for READ and WRITE.
char a[] = "string"; - This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g) plus NULL character (\0) at the end.
char *p = "string"; - This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string". This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p just points to that string.
Now a[0] (index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'. This operation is allowed because we have READ WRITE access in stack.
But p[0] = 'x' will leads to crash, because we have only READ access to text segement. Segmentation fault will happen if we do any write on text segment.
But you can change the value of variable p, because its local variable in stack. like below
char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);
This is allowed. Here we are changing the address stored in the pointer variable p to address of the string start(again start is also a read only data in text segement). If you want to modify values present in *p means go for dynamically allocated memory.
char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");
Now p[0] = 'x' operation is allowed, because now we are writing in heap.
char *p = "string"; creates a pointer to read-only memory where string literal "string" is stored. Trying to modify string that p points to leads to undefined behaviour.
char a[] = "string"; creates an array and initializes its content by using string literal "string".
They do differ as to where the memory is stored. Ideally the second one should use const char *.
The first one
char buf[] = "hello";
creates an automatic buffer big enough to hold the characters and copies them in (including the null terminator).
The second one
const char * buf = "hello";
should use const and simply creates a pointer that points at memory usually stored in static space where it is illegal to modify it.
The converse (of the fact you can modify the first safely and not the second) is that it is safe to return the second pointer from a function, but not the first. This is because the second one will remain a valid memory pointer outside the scope of the function, the first will not.
const char * sayHello()
{
const char * buf = "hello";
return buf; // valid
}
const char * sayHelloBroken()
{
char buf[] = "hello";
return buf; // invalid
}
a declares an array of char values -- an array of chars which is terminated.
p declares a pointer, which refers to an immutable, terminated, C string, whose exact storage location is implementation-defined. Note that this should be const-qualified (e.g. const char *p = "string";).
If you print it out using std::cout << "a: " << sizeof(a) << "\np: " << sizeof(p) << std::endl;, you will see differences their sizes (note: values may vary by system):
a: 7
p: 8
Here what is ? operator? Is it a part of a string or it has some specific meaning?
char a[] = ?string?
I assume they were once double quotes "string", which potentially were converted to "smart quotes", then could not be represented as such along the way, and were converted to ?.
C and C++ have very similar Pointer to Array relationships...
I can't speak for the exact memory locations of the two statements you are asking about, but I found they articles interesting and useful for understanding some of the differences between the char Pointer declaration, and a char Array declaration.
For clarity:
C Pointer and Array relationship
C++ Pointer to an Array
I think it's important to remember that an array, in C and C++, is a constant pointer to the first element of the array. And consequently you can perform pointer arithmetic on the array.
char *p = "string"; <--- This is a pointer that points to the first address of a character string.
the following is also possible:
char *p;
char a[] = "string";
p = a;
At this point p now references the first memory address of a (the address of the first element)
and so *p == 's'
*(p++) == 't' and so on. (or *(p+1) == 't')
and the same thing would work for a: *(a++) or *(a+1) would also equal 't'