As the heading says, What is the difference between
char a[] = ?string?; and
char *p = ?string?;
This question was asked to me in interview.
I even dont understand the statement.
char a[] = ?string?
Here what is ? operator? Is it a part of a string or it has some specific meaning?
The ? seems to be a typo, it is not semantically valid. So the answer assumes the ? is a typo and explains what probably the interviewer actually meant to ask.
Both are distinctly different, for a start:
The first creates a pointer.
The second creates an array.
Read on for more detailed explanation:
The Array version:
char a[] = "string";
Creates an array that is large enough to hold the string literal "string", including its NULL terminator. The array string is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.
The pointer version:
char *p = "string";
Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates[Ref 1] use of string literal without the const keyword. So the declaration should be:
const char *p = "string";
Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.
Which version is better and which one shall I use?
Depends on the Usage.
If you do not need to make any changes to the string, use the pointer version.
If you intend to change the data, use the array version.
Note: This is a not C++ but this is C specific.
Note that, use of string literal without the const keyword is perfectly valid in C.
However, modifying a string literal is still an Undefined Behavior in C[Ref 2].
This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
The first one is array the other is pointer.
The array declaration char a[6]; requests that space for six characters be set aside, to be known by the name a. That is, there is a location named a at which six characters can sit. The pointer declaration char *p; on the other hand, requests a place which holds a pointer. The pointer is to be known by the name p, and can point to any char (or contiguous array of chars) anywhere.
The statements
char a[] = "string";
char *p = "string";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+----+
a: | s | t | r | i | n | g | \0 |
+---+---+---+---+---+---+----+
+-----+ +---+---+---+---+---+---+---+
p: | *======> | s | t | r | i | n | g |\0 |
+-----+ +---+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location a, move three elements past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character l, but the compiler gets there differently.
Source: comp.lang.c FAQ list · Question 6.2
char a[] = "string";
This allocates the string on the stack.
char *p = "string";
This creates a pointer on the stack that points to the literal in the data segment of the process.
? is whoever wrote it not knowing what they were doing.
Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc and calloc will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.
In these 4 segements, text segment is the READ ONLY segment and in the all the other three is for READ and WRITE.
char a[] = "string"; - This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g) plus NULL character (\0) at the end.
char *p = "string"; - This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string". This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p just points to that string.
Now a[0] (index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'. This operation is allowed because we have READ WRITE access in stack.
But p[0] = 'x' will leads to crash, because we have only READ access to text segement. Segmentation fault will happen if we do any write on text segment.
But you can change the value of variable p, because its local variable in stack. like below
char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);
This is allowed. Here we are changing the address stored in the pointer variable p to address of the string start(again start is also a read only data in text segement). If you want to modify values present in *p means go for dynamically allocated memory.
char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");
Now p[0] = 'x' operation is allowed, because now we are writing in heap.
char *p = "string"; creates a pointer to read-only memory where string literal "string" is stored. Trying to modify string that p points to leads to undefined behaviour.
char a[] = "string"; creates an array and initializes its content by using string literal "string".
They do differ as to where the memory is stored. Ideally the second one should use const char *.
The first one
char buf[] = "hello";
creates an automatic buffer big enough to hold the characters and copies them in (including the null terminator).
The second one
const char * buf = "hello";
should use const and simply creates a pointer that points at memory usually stored in static space where it is illegal to modify it.
The converse (of the fact you can modify the first safely and not the second) is that it is safe to return the second pointer from a function, but not the first. This is because the second one will remain a valid memory pointer outside the scope of the function, the first will not.
const char * sayHello()
{
const char * buf = "hello";
return buf; // valid
}
const char * sayHelloBroken()
{
char buf[] = "hello";
return buf; // invalid
}
a declares an array of char values -- an array of chars which is terminated.
p declares a pointer, which refers to an immutable, terminated, C string, whose exact storage location is implementation-defined. Note that this should be const-qualified (e.g. const char *p = "string";).
If you print it out using std::cout << "a: " << sizeof(a) << "\np: " << sizeof(p) << std::endl;, you will see differences their sizes (note: values may vary by system):
a: 7
p: 8
Here what is ? operator? Is it a part of a string or it has some specific meaning?
char a[] = ?string?
I assume they were once double quotes "string", which potentially were converted to "smart quotes", then could not be represented as such along the way, and were converted to ?.
C and C++ have very similar Pointer to Array relationships...
I can't speak for the exact memory locations of the two statements you are asking about, but I found they articles interesting and useful for understanding some of the differences between the char Pointer declaration, and a char Array declaration.
For clarity:
C Pointer and Array relationship
C++ Pointer to an Array
I think it's important to remember that an array, in C and C++, is a constant pointer to the first element of the array. And consequently you can perform pointer arithmetic on the array.
char *p = "string"; <--- This is a pointer that points to the first address of a character string.
the following is also possible:
char *p;
char a[] = "string";
p = a;
At this point p now references the first memory address of a (the address of the first element)
and so *p == 's'
*(p++) == 't' and so on. (or *(p+1) == 't')
and the same thing would work for a: *(a++) or *(a+1) would also equal 't'
Related
In a book I am reading, C++ from scratch, on page 113 the writer creates a char array:
myString[80];
then he uses a function that copies some charectars:
strcpy(myString,"Hello there");
then he creates a pointer to the array:
*p1 = myString;
then he uses an offset and assings a value at that offset:
p1[4]='c';
my question is, p1 is a pointer so it is a memory address, and the offset of 4 gives him the memory address 4 spaces in front, so that means he is assingning the letter 'c' to the memory address rather than at the value stored at that address. Shouldnt it be:
*(p1[4])='c';
basically, how come *(p1 + 4) needs dereferencing but p1[4] does not?
I tried to understand this and the only thing that could make sense to me was if the square brackets act as an asterisk to dereference the pointer. Is this correct or is there another reason why p1[4] does not need to be dereferenced?
then he creates a pointer to the array:
*p1 = myString;
Assuming, by this, you actually mean that p1 is declared and initialised using;
char *p1 = myString;
then your interpretation is wrong. p1 is a pointer to a char not a pointer to an array.
In this definition, myString is the (previously declared in your question) name of an array of char. In the initialisation
char *p1 = myString;
the name myString is converted to a pointer. That pointer will have the value &myString[0] (i.e. the address of the first character in myString). That is the value that p1 will receive.
The statement
p1[4] = 'c';
will then set the fifth character (since indexing is zero based) of myString to be 'c'. The result is therefore changing myString[4] to the value 'c'. This means that (the first 11 characters of) myString will be "Hellc there".
Assuming the above, the expression *(p1[4])='c' will not compile, since (p1[4]) is of type char, and cannot be dereferenced using the * operator.
Semantically, in an expression p1[4] is equivalent to *(p1 + 4). Since p1 is initialised to be equal to &myString[0], p1[4] is ALSO equivalent to both myString[4] and to *(myString + 4).
Note: If *(p1[4])='c' was valid in your code, then p1[4] = 'c' would not be valid, which suggests my assumption about the declaration and initialisation of p1 is correct - despite the fact you have omitted such information.
p1 + 4 would be the memory address 4 places beyond.
The expression p1[4] (exactly equivalent to *(p1+4)) is called an lvalue expression. We say that an lvalue expression designates a memory location. Or in other words, the lvalue expression is synonymous with the memory location itself. You can always use the & address-of operator on an lvalue expression, and that gives you a pointer to the memory location.
The lvalue expression will go on to be used in one of three possible ways:
Store a value in the designated memory location, or
Retrieve a value from the designated memory location, or
Neither of those.
There is no special syntax to distinguish between these three cases; rather it depends on the larger expression of which the lvalue expression is a part of. For example, applying the & operator is case 3; appearing on the left-hand side of the assignment operator = is case 1. Most other usages fall under case 2.
p1[4] is treated as *(p1 + 4) which is "value at an offset of 4 from p1".
Also when you declare a char array:
myString[80]
myString is a pointer to the array (It points to the first element). So when you do myString[4] it is also treated as *(myString + 4)
Writing *(p1[4]) would mean *(*(p1 + 4))
So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.
This question already has answers here:
How can a char pointer be initialized with a string (Array of characters) but an int pointer not with an array of integer? [duplicate]
(4 answers)
Closed 5 years ago.
I am trying to understand the relationship between strings, arrays, and pointers.
The book I am reading has a program in which it initializes a variable as follows:
char* szString= "Name";
The way I understand this, is that a C-style string is simply an array of chars. An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset. I.e.
array[5] in fact returns what is evaluated from expression *(array+5).
So, from my understanding and testing the szString is in fact initialized as a pointer which points to the first address of the array storing "Name". I can deduce this because the output to:
cout << *szstring;
is the character "N".
My understanding of the statement
cout << szstring;
outputting the characters "Name", is that the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character. On the other hand for argument *szstring a different version of this method is used that supports C-style strings.
Therefore, if I can initialize a char type pointer to address the first element in an array of chars (a C-style string), why can I not initialize an int type pointer to the first element in an array of integers as follows:
int* intArray = {1,2,3};
a C-style string is simply an array of chars
Correct.
An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset.
No, not really.
the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character
cout is not a "method", but its operator<< works this way yes.
Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers?
The simple answer is that string literals are special, otherwise we would not be able to use them.
In many ways, including this way, the language standards dictate special handling for both string literals and char*s.
why can I not initialize an int type pointer to the first element in an array of integers
C++ could have ultimately extended the syntax of other pointer initialisations to do a similar thing, but it didn't actually need to because instead we have the far superior:
std::vector<int> myInts{1,2,3};
The short answer is that there exist character array literals, but no int array literals.
A string literal is a literal value of array type, and it is an lvalue, so that's something whose address you can take and store. The lifetime of the object designated by such a value is permanent, so pointers thus obtained are valid throughout the entire program.
By contrast, there is no literal of type "array of int", and no unnamed int array lvalues.
Don't confuse this with the braced initialization lists, which are not expressions and therefore not values! Braced lists can be used to initialize variables of array type, but they are not themselves values.
If anything, the only odd-man-out in the language grammar is that it is permissible to initialize a char array with a braced list containing a string literal: char a[] = {"foo"}; Think of this as a kind of copy initialization; a is a copy of the literal lvalue.
As a beginner I had a similar question. Please look at this post and the answers.
This const char* szString= "Name" assigns to the pointer szString the address of the initial element of an array whose contents are "Name" (followed by a terminating '\0' null character).
There's no implicit conversion from int to int*, other that 0 being a special case, as a null pointer.
Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.
When I type int * a = 10 it shows error .But when I give char *b = "hello" it doesn't shows error ?
We can't initialize values directly to pointers but how it is possible only in char. How it is possible to allocate value in the character pointers?
The type of "hello" is a char array, which decays into a char pointer. You can therefore use it to initialize a variable of type char*.
The type of 10 is int. It cannot be implicitly converted to int* and hence int *a = 10 is not valid. The following is perhaps the closest int equivalent to your char example:
int arr[] = {1, 2, 3};
int *a = arr;
(There is also an issue with constness here, which I am not addressing to keep things simple. See this question if you'd like to learn more.)
This is because "hello" is a string literal that represents an array of char. The starting address of the array is assigned to the pointer b in the assignment char *b = "hello". 10 is a value of type int and cannot be assigned to a pointer of int.
The type of the string literal in C++ is char const[6], and char[6] in C (it's the number of characters in the literal, including the terminating NUL character).
While char *b = "hello"; is legal in C, it is deprecated in C++03, and illegal in C++11. You must write char const *b = "hello";
The reason that works is because both languages define an implicit conversion of array types to a pointer to the first element of the array. This is commonly referred to as decay of the array.
No such conversion is applicable to int *a = 10;, so that fails in both languages.