This question already has answers here:
Two string literals have the same pointer value?
(5 answers)
Closed 4 years ago.
Here is the code example(compiled and run in vs2015):
#include<cassert>
using namespace std;
int main() {
const char*p = "ohoh";
const char*p1 = "ohoh";
char p3[] = "ohoh";
char p4[] = "ohoh";
assert(p == p1);//OK,success,is this always true?
assert(p3 == p4);//failed
return 0;
}
As far as I know,the string literals are stored in the readonly segment in address space,and const char*p = "ohoh"; just generate a pointer to that position.However,it seems like the compiler will just generate one copy of that string literal,so the p==p1 is true.
Is it a optimization ,or something guaranteed by the standard?
No, it is not guaranteed by the standard. According to cppref:
The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.
The behavior is unspecified, you can't rely on it. From the standard, [lex.string]/16
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
For p3 and p4, they're different things. Note that p and p1 are pointers (to string literal) but p3 and p4 are arrays initialized from string literals.
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
That means p3 and p4 are independent arrays. When decay to pointer they'll be different (because they point to different arrays), then p3 == p4 would be false.
It is implementation defined whether equal string literals are stored by the compiler as one string literal. So this comparison
p == p1
can yield either true or false depending on the compiler options.
As for arrays then they do not have a built-in comparison operator.
Instead of
assert(p == p1);
assert(p3 == p4);
you could write
assert( strcmp( p, p1 ) == 0 );
assert( strcmp( p3, p4 ) == 0 );
String literals may share storage, and may be in read-only memory.
Neither is guaranteed though.
What is guaranteed is that two different arrays won't share space unless their lifetime does not overlap. In the latter case there's no conforming way to prove it anyway, so who cares?
Related
In a book I am reading, C++ from scratch, on page 113 the writer creates a char array:
myString[80];
then he uses a function that copies some charectars:
strcpy(myString,"Hello there");
then he creates a pointer to the array:
*p1 = myString;
then he uses an offset and assings a value at that offset:
p1[4]='c';
my question is, p1 is a pointer so it is a memory address, and the offset of 4 gives him the memory address 4 spaces in front, so that means he is assingning the letter 'c' to the memory address rather than at the value stored at that address. Shouldnt it be:
*(p1[4])='c';
basically, how come *(p1 + 4) needs dereferencing but p1[4] does not?
I tried to understand this and the only thing that could make sense to me was if the square brackets act as an asterisk to dereference the pointer. Is this correct or is there another reason why p1[4] does not need to be dereferenced?
then he creates a pointer to the array:
*p1 = myString;
Assuming, by this, you actually mean that p1 is declared and initialised using;
char *p1 = myString;
then your interpretation is wrong. p1 is a pointer to a char not a pointer to an array.
In this definition, myString is the (previously declared in your question) name of an array of char. In the initialisation
char *p1 = myString;
the name myString is converted to a pointer. That pointer will have the value &myString[0] (i.e. the address of the first character in myString). That is the value that p1 will receive.
The statement
p1[4] = 'c';
will then set the fifth character (since indexing is zero based) of myString to be 'c'. The result is therefore changing myString[4] to the value 'c'. This means that (the first 11 characters of) myString will be "Hellc there".
Assuming the above, the expression *(p1[4])='c' will not compile, since (p1[4]) is of type char, and cannot be dereferenced using the * operator.
Semantically, in an expression p1[4] is equivalent to *(p1 + 4). Since p1 is initialised to be equal to &myString[0], p1[4] is ALSO equivalent to both myString[4] and to *(myString + 4).
Note: If *(p1[4])='c' was valid in your code, then p1[4] = 'c' would not be valid, which suggests my assumption about the declaration and initialisation of p1 is correct - despite the fact you have omitted such information.
p1 + 4 would be the memory address 4 places beyond.
The expression p1[4] (exactly equivalent to *(p1+4)) is called an lvalue expression. We say that an lvalue expression designates a memory location. Or in other words, the lvalue expression is synonymous with the memory location itself. You can always use the & address-of operator on an lvalue expression, and that gives you a pointer to the memory location.
The lvalue expression will go on to be used in one of three possible ways:
Store a value in the designated memory location, or
Retrieve a value from the designated memory location, or
Neither of those.
There is no special syntax to distinguish between these three cases; rather it depends on the larger expression of which the lvalue expression is a part of. For example, applying the & operator is case 3; appearing on the left-hand side of the assignment operator = is case 1. Most other usages fall under case 2.
p1[4] is treated as *(p1 + 4) which is "value at an offset of 4 from p1".
Also when you declare a char array:
myString[80]
myString is a pointer to the array (It points to the first element). So when you do myString[4] it is also treated as *(myString + 4)
Writing *(p1[4]) would mean *(*(p1 + 4))
Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.
I was under the impression that comparison operators are not defined for C-style strings, which is why we use things like strcmp(). Therefore the following code would be illegal in C and C++:
if("foo" == "foo"){
printf("The C-style comparison worked.\n");
}
if("foo" == "bob"){
printf("The C-style comparison produced the incorrect answer.\n");
} else {
printf("The C-style comparison worked, strings were not equal.\n");
}
But I tested it in both Codeblocks using GCC and in VS 2015, compiling as C and also as C++. Both allowed the code and produced the correct output.
Is it legal to compare C-style strings? Or is it a non-standard compiler extension that allows this code to work?
If this is legal, then why do people use strcmp() in C?
The compiler is free to use string interning, i.e. save memory by avoiding to duplicate identical data. The 2 "foo" literals that compare equal must be stored in the same memory location in your case.
However, you should not take this as the rule. The strcmp method will work under all circumstances, whereas it is implementation defined whether your observation will hold with another compiler, compiler version, compilation flags set etc.
The code is legal in C. It just may not produce the result you expected.
The type of string literal is char[N] in C and const char[N] in C++, where N is the number of characters in the string literal.
"foo" is type char[4] and const char[4] in C and C++ respectively. Basically it's an array. An array gets converted into a pointer to its first element when used in an expression. So in the comparison, if("foo" == "foo") the string literals get converted into pointers. Hence, the "address comparison".
In the comparison,
if("foo" == "foo"){
the addresses of the string literals are compared, which may or may not be equal.
It is equivalent to:
const char *p = "foo";
const char *q = "foo";
if ( p == q) {
...
}
C standard doesn't guarantee that addresses are equal for two string literals with same content ("foo"'s here) are placed in same location. But in practice, any compiler would place at the same address. So the comparison seems to work. But you can't rely on this behaviour.
6.4.5, String literals (C11, draft)
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Similarly, this comparison
if("foo" == "bob"){
...
}
is equivalent to:
const char *x = "foo";
const char *y = "bob";
if("foo" == "bob"){
...
}
In this case, the string literals would be at different locations and pointer comparison fails. So in both cases, it looks as if the == operator actually works for comparing C-strings.
Instead if you do comparisons using arrays, it will not work:
char s1[] ="foo";
char s2[] = "foo";
if (s1 == s2) {
/* always false */
}
The difference is that when an array is initialized with a string literals, it's copied into the array. The arrays s1 and s2 have distinct the addresses and will never be equal. But in case of string literals, both p and q point to the same address (assuming the compiler places so - this is not guaranteed as noted above).
it is copying/comparing the addresses of the the string, not the content of the strings.
comparing the addresses is a valid operation
Suppose I have two pointers:
char* p1 = nullptr;
char* p2 = std::malloc( 4 );
std::size_t offset = p2 - p1;
Is it safe to get offset in this way? So far it works fine on my computer. But I'm wondering if the offset can exceed the maximum number of size_t such that this method fails?
This is undefined behavior, from the draft C++ standard section 5.7 Additive operators:
When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array
elements. The type of the result is an implementation-defined signed
integral type; this type shall be the same type that is defined as
std::ptrdiff_t in the header (18.2). [...] Unless both
pointers point to elements of the same array object, or one past the
last element of the array object, the behavior is
undefined.82
Also as the reference mentions, the result is std::ptrdiff_t not size_t.
you can on the other hand add or subtract the value 0 which is covered in paragraph 7:
If the value 0 is added to or subtracted from a pointer value, the
result compares equal to the original pointer value. If two pointers
point to the same object or both point one past the end of the same
array or both are null, and the two pointers are subtracted, the
result compares equal to the value 0 converted to the type
std::ptrdiff_t.
If you want to convert a pointer to an integral value then you should use either intptr_t or uinitptr_t:
intptr_t integer type capable of holding a pointer
uintptr_t unsigned integer type capable of holding a pointer
For example:
uintptr_t ip = reinterpret_cast<uintptr_t>( p2 ) ;
No it is not safe. Basically the only thing you can do with null pointer is to compare it with another pointer. As for addition and subtraction one can only add or subtract zero to a null pointer, and subtract two null pointers - which may be useful in generic programming. Your case is undefined behaviour.
In addition to the answer by Wojtek, pointer arithmetic can and should only be done between related pointers. For example if you have e.g. char* p3 = p2 + 4, then you could do p3 - p2 to get the difference between the two pointers, that would be legal.
However, things like
char* p4 = new char[4];
std::cout << p4 - p2 << '\n';
is not legal, as p2 and p4 are not related.
As the heading says, What is the difference between
char a[] = ?string?; and
char *p = ?string?;
This question was asked to me in interview.
I even dont understand the statement.
char a[] = ?string?
Here what is ? operator? Is it a part of a string or it has some specific meaning?
The ? seems to be a typo, it is not semantically valid. So the answer assumes the ? is a typo and explains what probably the interviewer actually meant to ask.
Both are distinctly different, for a start:
The first creates a pointer.
The second creates an array.
Read on for more detailed explanation:
The Array version:
char a[] = "string";
Creates an array that is large enough to hold the string literal "string", including its NULL terminator. The array string is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.
The pointer version:
char *p = "string";
Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates[Ref 1] use of string literal without the const keyword. So the declaration should be:
const char *p = "string";
Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.
Which version is better and which one shall I use?
Depends on the Usage.
If you do not need to make any changes to the string, use the pointer version.
If you intend to change the data, use the array version.
Note: This is a not C++ but this is C specific.
Note that, use of string literal without the const keyword is perfectly valid in C.
However, modifying a string literal is still an Undefined Behavior in C[Ref 2].
This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
The first one is array the other is pointer.
The array declaration char a[6]; requests that space for six characters be set aside, to be known by the name a. That is, there is a location named a at which six characters can sit. The pointer declaration char *p; on the other hand, requests a place which holds a pointer. The pointer is to be known by the name p, and can point to any char (or contiguous array of chars) anywhere.
The statements
char a[] = "string";
char *p = "string";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+----+
a: | s | t | r | i | n | g | \0 |
+---+---+---+---+---+---+----+
+-----+ +---+---+---+---+---+---+---+
p: | *======> | s | t | r | i | n | g |\0 |
+-----+ +---+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location a, move three elements past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character l, but the compiler gets there differently.
Source: comp.lang.c FAQ list · Question 6.2
char a[] = "string";
This allocates the string on the stack.
char *p = "string";
This creates a pointer on the stack that points to the literal in the data segment of the process.
? is whoever wrote it not knowing what they were doing.
Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc and calloc will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.
In these 4 segements, text segment is the READ ONLY segment and in the all the other three is for READ and WRITE.
char a[] = "string"; - This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g) plus NULL character (\0) at the end.
char *p = "string"; - This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string". This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p just points to that string.
Now a[0] (index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'. This operation is allowed because we have READ WRITE access in stack.
But p[0] = 'x' will leads to crash, because we have only READ access to text segement. Segmentation fault will happen if we do any write on text segment.
But you can change the value of variable p, because its local variable in stack. like below
char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);
This is allowed. Here we are changing the address stored in the pointer variable p to address of the string start(again start is also a read only data in text segement). If you want to modify values present in *p means go for dynamically allocated memory.
char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");
Now p[0] = 'x' operation is allowed, because now we are writing in heap.
char *p = "string"; creates a pointer to read-only memory where string literal "string" is stored. Trying to modify string that p points to leads to undefined behaviour.
char a[] = "string"; creates an array and initializes its content by using string literal "string".
They do differ as to where the memory is stored. Ideally the second one should use const char *.
The first one
char buf[] = "hello";
creates an automatic buffer big enough to hold the characters and copies them in (including the null terminator).
The second one
const char * buf = "hello";
should use const and simply creates a pointer that points at memory usually stored in static space where it is illegal to modify it.
The converse (of the fact you can modify the first safely and not the second) is that it is safe to return the second pointer from a function, but not the first. This is because the second one will remain a valid memory pointer outside the scope of the function, the first will not.
const char * sayHello()
{
const char * buf = "hello";
return buf; // valid
}
const char * sayHelloBroken()
{
char buf[] = "hello";
return buf; // invalid
}
a declares an array of char values -- an array of chars which is terminated.
p declares a pointer, which refers to an immutable, terminated, C string, whose exact storage location is implementation-defined. Note that this should be const-qualified (e.g. const char *p = "string";).
If you print it out using std::cout << "a: " << sizeof(a) << "\np: " << sizeof(p) << std::endl;, you will see differences their sizes (note: values may vary by system):
a: 7
p: 8
Here what is ? operator? Is it a part of a string or it has some specific meaning?
char a[] = ?string?
I assume they were once double quotes "string", which potentially were converted to "smart quotes", then could not be represented as such along the way, and were converted to ?.
C and C++ have very similar Pointer to Array relationships...
I can't speak for the exact memory locations of the two statements you are asking about, but I found they articles interesting and useful for understanding some of the differences between the char Pointer declaration, and a char Array declaration.
For clarity:
C Pointer and Array relationship
C++ Pointer to an Array
I think it's important to remember that an array, in C and C++, is a constant pointer to the first element of the array. And consequently you can perform pointer arithmetic on the array.
char *p = "string"; <--- This is a pointer that points to the first address of a character string.
the following is also possible:
char *p;
char a[] = "string";
p = a;
At this point p now references the first memory address of a (the address of the first element)
and so *p == 's'
*(p++) == 't' and so on. (or *(p+1) == 't')
and the same thing would work for a: *(a++) or *(a+1) would also equal 't'