This question already has answers here:
C: differences between char pointer and array [duplicate]
(14 answers)
What is the difference between char array and char pointer in C?
(8 answers)
Closed 9 years ago.
Just starting out in C++, I was wondering if someone could explain something.
I believe you can initialise a char array in the following way
char arr[] = "Hello"
This will create a Char array with the values 'H', 'e', 'l', 'l', 'o', '\0'.
But if I do create this:
char* cp = "Hello";
Will that create an array, and the pointer to that array?
Eg: cp will point to the first element ('H') in memory, with the additional elements of the array?
The string literal itself has array type. So in the first example you gave, there are actually two arrays involved. The first is the array containing the string literal and the second is the array arr that you're declaring. The characters from the string literal are copied into arr. The C++11 wording is:
A char array (whether plain char, signed char, or unsigned char), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow character literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces. Successive characters of the value of the string literal initialize the elements of the array.
In the second example, you are letting the string literal array undergo array-to-pointer conversion to get a pointer to its first element. So your pointer is pointing at the first element of the string literal array.
However, note that your second example uses a feature that is deprecated in C++03 and removed in C++11 allowing a cast from a string literal to a char*. For valid C++11, it would have to instead be:
const char* cp = "Hello";
If do use the conversion to char* in C++03 or in C, you must make sure you don't attempt to modify the characters, otherwise you'll have undefined behaviour.
An array is basically a constant pointer, which points to the beginning of an array. A pointer is just a pointer, which points to any memory location. So given the pointer p, p[3] would point to p+3, which would give a segmentation fault, unless you had declared it as an "array" with at least 4 elements(int *p = new int[4];). This is exactly the same for int p[4];, except the fact that p is now a const int *.
Related
This question already has answers here:
How can a char pointer be initialized with a string (Array of characters) but an int pointer not with an array of integer? [duplicate]
(4 answers)
Closed 5 years ago.
I am trying to understand the relationship between strings, arrays, and pointers.
The book I am reading has a program in which it initializes a variable as follows:
char* szString= "Name";
The way I understand this, is that a C-style string is simply an array of chars. An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset. I.e.
array[5] in fact returns what is evaluated from expression *(array+5).
So, from my understanding and testing the szString is in fact initialized as a pointer which points to the first address of the array storing "Name". I can deduce this because the output to:
cout << *szstring;
is the character "N".
My understanding of the statement
cout << szstring;
outputting the characters "Name", is that the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character. On the other hand for argument *szstring a different version of this method is used that supports C-style strings.
Therefore, if I can initialize a char type pointer to address the first element in an array of chars (a C-style string), why can I not initialize an int type pointer to the first element in an array of integers as follows:
int* intArray = {1,2,3};
a C-style string is simply an array of chars
Correct.
An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset.
No, not really.
the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character
cout is not a "method", but its operator<< works this way yes.
Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers?
The simple answer is that string literals are special, otherwise we would not be able to use them.
In many ways, including this way, the language standards dictate special handling for both string literals and char*s.
why can I not initialize an int type pointer to the first element in an array of integers
C++ could have ultimately extended the syntax of other pointer initialisations to do a similar thing, but it didn't actually need to because instead we have the far superior:
std::vector<int> myInts{1,2,3};
The short answer is that there exist character array literals, but no int array literals.
A string literal is a literal value of array type, and it is an lvalue, so that's something whose address you can take and store. The lifetime of the object designated by such a value is permanent, so pointers thus obtained are valid throughout the entire program.
By contrast, there is no literal of type "array of int", and no unnamed int array lvalues.
Don't confuse this with the braced initialization lists, which are not expressions and therefore not values! Braced lists can be used to initialize variables of array type, but they are not themselves values.
If anything, the only odd-man-out in the language grammar is that it is permissible to initialize a char array with a braced list containing a string literal: char a[] = {"foo"}; Think of this as a kind of copy initialization; a is a copy of the literal lvalue.
As a beginner I had a similar question. Please look at this post and the answers.
This const char* szString= "Name" assigns to the pointer szString the address of the initial element of an array whose contents are "Name" (followed by a terminating '\0' null character).
There's no implicit conversion from int to int*, other that 0 being a special case, as a null pointer.
I had always thought it fine to, in my mind, replace any use of a literal with a temporary variable of that literal's type and value. If this is the case, since string literals are of type array of const char would initialising a character array through a string literal not be considered array copy-initialisation? E.g. wouldn't
const char test1[] = "hello";
be somewhat the same as doing...
const char temp[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
const char test2[] = temp;
which would be forbidden since this is an example of array copy initialisation? How is it that string literals can be used to initialise an array if the literal's type is an array? Maybe somewhat related, if string literals are of type array of const char then how is it the following code seems to compile fine on my system?
char* test3 = "hello";
Since test3 is missing low-level const the compiler misses this unlawful conversion, but it compiles fine anyway? Of course trying to change any element through test3 causes the program to crash.
There is no difference between copy or direct-initialization for arrays. Both cases are handled identically by the compiler. The analogy you make in the beginning is more of a rule of thumb. In reality, an array cannot be initialized by another array unless it is a string literal. BTW your analogy is not entirely correct. The target array would be direct-initialized with the temporary array:
const char test2[](test1);
But this still won't compile for the same reason. This is how initialization of a character array works.
[dcl.init]/p17:
The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.
If the initializer is a (non-parenthesized) braced-init-list, the object or reference is list-initialized (8.5.4).
If the destination type is a reference type, see 8.5.3.
If the destination type is an array of characters, an array of char16_t, an array of char32_t, or an array of wchar_t, and the initializer is a string literal, see 8.5.2.
8.5.2:
An array of narrow character type (3.9.1), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow string literal, char16_t string literal, char32_t string literal, or wide string literal,
respectively, or by an appropriately-typed string literal enclosed in braces (2.13.5). Successive characters of the value of the string literal initialize the elements of the array. [ Example:
char msg[] = "Syntax error on line %s\n";
shows a character array whose members are initialized with a string-literal. [..]
In your other example the string literal decays into a pointer to its first element, with which test3 is initialized. This code is invalid in C++111, as the decayed pointer is const char*, but this was a valid conversion in C because string literals were non-const. It was allowed in until C++03 where it was deprecated.
1: Some compilers still allow the conversion in C++11 as an extension.
How is NULL (or 0 or '\0') behaved in unsigned char array and char array? In a char array NULL determines the end of the char array. Is it the same case with an unsigned char array? If not how can we determine the end of an unsigned char array?
The exact definition of NULL is implementation-defined; all that's guaranteed about it is that it's a macro that expands to a null pointer constant. In turn, a null pointer constant is "an integral constant expression (5.19) prvalue of integer type that evaluates to zero or a prvalue of type std::nullptr_t." It may or may not be convertible to char or unsigned char; it should only really be used with pointers.
0 is a literal of type int having a value of zero. '\0' is a literal of type char having a value of zero. Either is implicitly convertible to unsigned char, producing a value of zero.
It is purely a convention that a string in C and C++ is often represented as a sequence of chars that ends at the first zero value. Nothing prevents you from declaring a char array that doesn't follow this convention:
char embedded_zero[] = {'a', '\0', 'b'};
Of course, a function that expects its argument to follow the convention would stop at the first zero: strlen(embedded_zero) == 1;.
You can, of course, write a function that takes unsigned char* and follows a similar convention, requiring the caller to indicate the end of the sequence with an element having zero value. Or, you may write a function that takes a second parameter indicating the length of the sequence. You decide which approach better fits your design.
Strictly speaking, '\0' denotes the end of a string literal, not the end of just any char array. For example, if you declare an array without initializing it to a string literal, there would be no end marker in it.
If you initialize an array of unsigned chars with a string literal, though, you would get the same '\0' end marker as in a regular character array. In other words, in the code below
char s[] = "abc";
unsigned char u[] = "abc";
s[3] and u[3] contain identical values of '\0'.
Are C strings (as opposed to std::string) guaranteed to be implemented as arrays? For example, say, I have
char const * str = "abc";
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is). I'm asking this because I dont know if C strings are a special case due to the null character terminating it.
First part of the question
Are C strings guaranteed to be implemented as arrays?
For example, say, I have: char const * str = "abc"
Yes, a string object is of an array type. A character string is a data format and a (character) string object is of a type array of char.
In your example str points to the string literal "abc". Character string literals have the type char[N+1] where N is the length of the string (i.e., the number of characters excluding the terminating null character).
Some references from Standard and K&R 2nd edition:
C defines a string literal as:
(C99, 6.4.5p2) "A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz"."
and says (emphasis mine):
C99, 6.4.5p5) "For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;"
K&R 2nd edition says:
"Technically, a string constant is an array of characters"
and
"when a string constant like "hello\n" appears in a C program, it is stored as an array of characters containing the characters in the string and terminated with a '\0' to mark the end."
Second part of the question
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is).
Yes, it is a valid pointer. In your case str + 4 is a pointer one past the last element of the array.
A valid pointer is a pointer that is either a null pointer or a pointer to a valid object. For an element of an array object, a pointer one past the last element of the array object is also a valid pointer.
Note that for the purpose of the last rule ("the one past element"), for pointers to objects that are not elements of an array, C treats the object as an array of one element.
(C99, 6.5.6p7) "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."
They are guaranteed to be a contiguous sequence of chars. If that's your definition of an array, then yes.
In your example you will have 4 chars, one for each character and one for the null terminator. str+4 will be out of range.
Are C strings guaranteed to be implemented as arrays?
With a wide definition of array, yes, they are a contiguous sequence of chars with a terminating null character.
What it boils down to is whether or not str + 4 a legal pointer value
The literal ("abc") is an array stored somewhere in the process memory. The type is is const char[4] (in C++, I am not sure if in C it is char[4]). Then str is a pointer to the first element of the string literal, and the expression str+3 is correct, can be dereferenced and the pointed character will be 0. The expression str+4 is a pointer beyond the end of the array and cannot be dereferenced.
The short answer is: yes, they are, but str+4 isn't necessarily a legal pointer as 1 char may not be equal to 1 byte.
As the heading says, What is the difference between
char a[] = ?string?; and
char *p = ?string?;
This question was asked to me in interview.
I even dont understand the statement.
char a[] = ?string?
Here what is ? operator? Is it a part of a string or it has some specific meaning?
The ? seems to be a typo, it is not semantically valid. So the answer assumes the ? is a typo and explains what probably the interviewer actually meant to ask.
Both are distinctly different, for a start:
The first creates a pointer.
The second creates an array.
Read on for more detailed explanation:
The Array version:
char a[] = "string";
Creates an array that is large enough to hold the string literal "string", including its NULL terminator. The array string is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.
The pointer version:
char *p = "string";
Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates[Ref 1] use of string literal without the const keyword. So the declaration should be:
const char *p = "string";
Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.
Which version is better and which one shall I use?
Depends on the Usage.
If you do not need to make any changes to the string, use the pointer version.
If you intend to change the data, use the array version.
Note: This is a not C++ but this is C specific.
Note that, use of string literal without the const keyword is perfectly valid in C.
However, modifying a string literal is still an Undefined Behavior in C[Ref 2].
This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
The first one is array the other is pointer.
The array declaration char a[6]; requests that space for six characters be set aside, to be known by the name a. That is, there is a location named a at which six characters can sit. The pointer declaration char *p; on the other hand, requests a place which holds a pointer. The pointer is to be known by the name p, and can point to any char (or contiguous array of chars) anywhere.
The statements
char a[] = "string";
char *p = "string";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+----+
a: | s | t | r | i | n | g | \0 |
+---+---+---+---+---+---+----+
+-----+ +---+---+---+---+---+---+---+
p: | *======> | s | t | r | i | n | g |\0 |
+-----+ +---+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location a, move three elements past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character l, but the compiler gets there differently.
Source: comp.lang.c FAQ list · Question 6.2
char a[] = "string";
This allocates the string on the stack.
char *p = "string";
This creates a pointer on the stack that points to the literal in the data segment of the process.
? is whoever wrote it not knowing what they were doing.
Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc and calloc will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.
In these 4 segements, text segment is the READ ONLY segment and in the all the other three is for READ and WRITE.
char a[] = "string"; - This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g) plus NULL character (\0) at the end.
char *p = "string"; - This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string". This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p just points to that string.
Now a[0] (index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'. This operation is allowed because we have READ WRITE access in stack.
But p[0] = 'x' will leads to crash, because we have only READ access to text segement. Segmentation fault will happen if we do any write on text segment.
But you can change the value of variable p, because its local variable in stack. like below
char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);
This is allowed. Here we are changing the address stored in the pointer variable p to address of the string start(again start is also a read only data in text segement). If you want to modify values present in *p means go for dynamically allocated memory.
char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");
Now p[0] = 'x' operation is allowed, because now we are writing in heap.
char *p = "string"; creates a pointer to read-only memory where string literal "string" is stored. Trying to modify string that p points to leads to undefined behaviour.
char a[] = "string"; creates an array and initializes its content by using string literal "string".
They do differ as to where the memory is stored. Ideally the second one should use const char *.
The first one
char buf[] = "hello";
creates an automatic buffer big enough to hold the characters and copies them in (including the null terminator).
The second one
const char * buf = "hello";
should use const and simply creates a pointer that points at memory usually stored in static space where it is illegal to modify it.
The converse (of the fact you can modify the first safely and not the second) is that it is safe to return the second pointer from a function, but not the first. This is because the second one will remain a valid memory pointer outside the scope of the function, the first will not.
const char * sayHello()
{
const char * buf = "hello";
return buf; // valid
}
const char * sayHelloBroken()
{
char buf[] = "hello";
return buf; // invalid
}
a declares an array of char values -- an array of chars which is terminated.
p declares a pointer, which refers to an immutable, terminated, C string, whose exact storage location is implementation-defined. Note that this should be const-qualified (e.g. const char *p = "string";).
If you print it out using std::cout << "a: " << sizeof(a) << "\np: " << sizeof(p) << std::endl;, you will see differences their sizes (note: values may vary by system):
a: 7
p: 8
Here what is ? operator? Is it a part of a string or it has some specific meaning?
char a[] = ?string?
I assume they were once double quotes "string", which potentially were converted to "smart quotes", then could not be represented as such along the way, and were converted to ?.
C and C++ have very similar Pointer to Array relationships...
I can't speak for the exact memory locations of the two statements you are asking about, but I found they articles interesting and useful for understanding some of the differences between the char Pointer declaration, and a char Array declaration.
For clarity:
C Pointer and Array relationship
C++ Pointer to an Array
I think it's important to remember that an array, in C and C++, is a constant pointer to the first element of the array. And consequently you can perform pointer arithmetic on the array.
char *p = "string"; <--- This is a pointer that points to the first address of a character string.
the following is also possible:
char *p;
char a[] = "string";
p = a;
At this point p now references the first memory address of a (the address of the first element)
and so *p == 's'
*(p++) == 't' and so on. (or *(p+1) == 't')
and the same thing would work for a: *(a++) or *(a+1) would also equal 't'