So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.
Related
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.
Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.
Out of curiosity, I'm wondering what the real underlying type of a C++ string literal is.
Depending on what I observe, I get different results.
A typeid test like the following:
std::cout << typeid("test").name() << std::endl;
shows me char const[5].
Trying to assign a string literal to an incompatible type like so (to see the given error):
wchar_t* s = "hello";
I get a value of type "const char *" cannot be used to initialize an entity of type "wchar_t *" from VS12's IntelliSense.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
I have read that this was allowed in pre-C++11 standards as it was for retro-compatibility with C, although modification of s would result in Undefined Behavior. I assume that this is simply VS12 having not yet implemented all of the C++11 standard and that this line would normally result in an error.
Reading the C99 standard (from here, 6.4.5.5) suggests that it should be an array:
The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence.
So, what is the type underneath a C++ string literal?
Thank you very much for your precious time.
The type of a string literal is indeed const char[SIZE] where SIZE is the length of the string plus the null terminating character.
The fact that you're sometimes seeing const char* is because of the usual array-to-pointer decay.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
This was correct behaviour in C++03 (as an exception to the usual const-correctness rules) but it has been deprecated since. A C++11 compliant compiler should not accept that code.
The type of a string literal is char const[N] where N is the number of characters including the terminating null character. Although this type does not convert to char*, the C++ standard includes a clause allowing assignments of string literal to char*. This clause was added to support compatibility especially for C code which didn't have const back then.
The relevant clause for the type in the standard is 2.14.5 [lex.string] paragraph 8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).
First off, the type of a C++ string literal is an array of n const char. Secondly, if you want to initialise a wchar_t with a string literal you have to code:
wchar_t* s = L"hello"
I know that for example "hello" is of type const char*. So my questions are:
How can we assign a literal string like "hello" to a non-const char* like this:
char* s = "hello"; // "hello" is type of const char* and s is char*
// and we know that conversion from const char* to
// char* is invalid
Is a literal string like "hello", which will take memory in all my program, or it's just like temporary variable that will get destroyed when the statement ends?
In fact, "hello" is of type char const[6].
But the gist of the question is still right – why does C++ allow us to assign a read-only memory location to a non-const type?
The only reason for this is backwards compatibility to old C code, which didn’t know const. If C++ had been strict here it would have broken a lot of existing code.
That said, most compilers can be configured to warn about such code as deprecated, or even do so by default. Furthermore, C++11 disallows this altogether but compilers may not enforce it yet.
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
is literal string like "hello" will take memory in all my program all it's just like a temporary variable that will get destroyed when the statement ends.
It is kept in programm data, so it is awaiable within lifetime of the programm. You can return pointers and references to this data from the current scope.
The only reason why const char* is being cast to char* is comatiblity with c, like winapi system calls. And this cast is made unexplicit unlike any other const casting.
Just use a string:
std::string s("hello");
That would be the C++ way. If you really must use char, you'll need to create an array and copy the contents over it.
The answer to your second question is that the variable s is stored in RAM as type pointer-to-char. If it's global or static, it's allocated on the heap and remains there for the life of the running program. If it's a local ("auto") variable, it's allocated on the stack and remains there until the current function returns. In either case, it occupies the amount of memory required to hold a pointer.
The string "Hello" is a constant, and it's stored as part of the program itself, along with all the other constants and initializers. If you built your program to run on an appliance, the string would be stored in ROM.
Note that, because the string is constant and s is a pointer, no copying is necessary. The pointer s simply points to wherever the string is stored.
In your example, you are not assigning, but constructing. std::string, for example, does have a std::string(const char *) constructor (actually it's more complicated, but it doesn't matter). And, similarly, char * (if it was a type rather than a pointer to a type) could have a const char * constructor, which is copying the memory.
I don't actually know how the compiler really works here, but I think it could be similar to what I've described above: a copy of "Hello" is constructed in stack and s is initialized with this copy's address.
Can you do this?
char* func()
{
char * c = "String";
return c;
}
is "String" here a globally allocated data by compiler?
You can do that. But it would be even more correct to say:
const char* func(){
return "String";
}
The c++ spec says that string literals are given static storage duration. I can't link to it because there are precious few versions of the c++ spec online.
This page on const correctness is the best reference I can find.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally
beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary
string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n
const char” and static storage duration (3.7), where n is the size of the string as defined below, and is
initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined.
The effect of attempting to modify a string literal is undefined.
You can do this currently (there is no reason to, though). But you cannot do this anymore with C++0x. They removed the deprecated conversion of a string literal (which has the type const char[N]) to a char *.
Note that this conversion is only for string literals. Thus the following two things are illegal, the first of which specifies an array and the second of which specifies a pointer for initialization
char *x = (0 ? "123" : "345"); // illegal: const char[N] -> char*
char *x = +"123"; // illegal: const char * -> char*
GCC incorrectly accepts both, Clang correctly rejects both.
The constant is not allocated on the heap but it is a constant. You don't need to destroy it.
Not in a modern compiler. In modern compilers, the type of "String" is const char *, which you can't assign to a char * due to the const mismatch.
If you made c a const char * (and changed the return type of the function), the code would be legal. Typically the string literal "String" would be placed in the executable's data section by the linker, and in many cases, in a special section for read-only data.