The difference between char* and int* - c++

What is the difference between char* and int*? Sure, they are of different types, but how is it that I can write
char* s1="hello world";
as
"hello world"
it is not a one character, it's an array of characters, and I cannot write
*s1
as
char* s1 = {'h','e','l','l','o',' ','w','o','r','l','d'};
and
int* a = {2,3,1,45,6};
What is the difference?

It is quite simple: A string literal, i.e., "foobar" is compiled to an array of chars which is stored in the static section of your program (i.e., where all constants are stored) and null terminated. Then, assigning this to a variable simply assigns a pointer to this memory to the variable. E.g., const char* a = "foo"; will assign the address where "foo" is stored to a.
In short, a string constant already brings the memory where it is to be stored with it.
In contrast, initializing a pointer with an initializer list, (i.e., a list of elements inside curly braces) is not defined for pointers. Informally, the problem with an initializer list -- in contrast to a string literal -- is that it does not "bring its own memory". Therefore, we must provide memory where the initializer list can store its chars. This is done by declaring an array instead of a pointer. This compiles fine:
char s1[11]={'h','e','l','l','o',' ','w','o','r','l','d'}
Now, we provided the space where the chars are to be stored by declaring s1 as an array.
Note that you can use brace initialization of pointers, though, e.g.:
char* c2 = {nullptr};
However, while the syntax seems equal, this something completely different which is called uniform initialization and will simply initialize c2 with nullptr.

In your first case, the string literal is decaying to a pointer to a const char. Although s1 really should be const char *, several compiler allow the other form as an extension:
const char* s1 = "hello world" ;
A sting literal is an array of const char, we can see this from the draft C++ standard section 2.14.5 String literals which says (emphasis mine going forward):
Ordinary string literals and UTF-8 string literals are also referred
to as narrow string literals. A narrow string literal has type “array
of n const char”, where n is the size of the string as defined below,
and has static storage duration (3.7).
The conversion of an array to pointer is covered in section 4.2 Array-to-pointer conversion which says:
[...] an expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue.[...]
Your other cases do not work because a scalar which can be an arithmetic type, enumeration types or a pointer type can only be initialized with a single element inside braces this is covered in the draft C++ standard section 5.17 Assignment and compound assignment operators 8.5.1 List-initialization paragraph 3 which says:
List-initialization of an object or reference of type T is defined as
follows:
and then enumerates the different cases the only that applies to the right hand side for this case is the following bullet:
Otherwise, if the initializer list has a single element of type E and
either T is not a reference type or its referenced type is
reference-related to E, the object or reference is initialized from
that element; if a narrowing conversion (see below) is required to
convert the element to T, the program is ill-formed.
which requires the list to have a single element, otherwise the final bullet applies:
Otherwise, the program is ill-formed.
In your two cases even if you reduced the initializer to one variable, the types are incorrect
h is a char and 2 is an int which won't convert to a pointer.
The assignment could be made to work by assigning the results to an array such as the following:
char s1[] = { 'h', 'e', 'l', 'l', 'o',' ', 'w', 'o', 'r', 'l', 'd' } ;
int a[] = { 2, 3, 1, 45, 6 } ;
This would be covered in section 8.5.1 Aggregates which says:
An array of unknown size initialized with a brace-enclosed
initializer-list containing n initializer-clauses, where n shall be
greater than zero, is defined as having n elements (8.3.4). [ Example:
int x[] = { 1, 3, 5 };
declares and initializes x as a one-dimensional array that has three
elements since no size was specified and there are three initializers.
—end example ] An empty initializer list {} shall not be used as the
initializer-clause for an array of unknown bound.104
Note:
It is incorrect to say that a brace-init-list is not defined for pointers, it is perfectly usable for pointers:
int x = 10 ;
int *ip = &x ;
int *a = {nullptr} ;
int *b = {ip} ;

Related

Are pointers arrays?

Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.

Would initialising a character array from a string literal be a case of array copy initialisation?

I had always thought it fine to, in my mind, replace any use of a literal with a temporary variable of that literal's type and value. If this is the case, since string literals are of type array of const char would initialising a character array through a string literal not be considered array copy-initialisation? E.g. wouldn't
const char test1[] = "hello";
be somewhat the same as doing...
const char temp[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
const char test2[] = temp;
which would be forbidden since this is an example of array copy initialisation? How is it that string literals can be used to initialise an array if the literal's type is an array? Maybe somewhat related, if string literals are of type array of const char then how is it the following code seems to compile fine on my system?
char* test3 = "hello";
Since test3 is missing low-level const the compiler misses this unlawful conversion, but it compiles fine anyway? Of course trying to change any element through test3 causes the program to crash.
There is no difference between copy or direct-initialization for arrays. Both cases are handled identically by the compiler. The analogy you make in the beginning is more of a rule of thumb. In reality, an array cannot be initialized by another array unless it is a string literal. BTW your analogy is not entirely correct. The target array would be direct-initialized with the temporary array:
const char test2[](test1);
But this still won't compile for the same reason. This is how initialization of a character array works.
[dcl.init]/p17:
The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.
If the initializer is a (non-parenthesized) braced-init-list, the object or reference is list-initialized (8.5.4).
If the destination type is a reference type, see 8.5.3.
If the destination type is an array of characters, an array of char16_t, an array of char32_t, or an array of wchar_t, and the initializer is a string literal, see 8.5.2.
8.5.2:
An array of narrow character type (3.9.1), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow string literal, char16_t string literal, char32_t string literal, or wide string literal,
respectively, or by an appropriately-typed string literal enclosed in braces (2.13.5). Successive characters of the value of the string literal initialize the elements of the array. [ Example:
char msg[] = "Syntax error on line %s\n";
shows a character array whose members are initialized with a string-literal. [..]
In your other example the string literal decays into a pointer to its first element, with which test3 is initialized. This code is invalid in C++111, as the decayed pointer is const char*, but this was a valid conversion in C because string literals were non-const. It was allowed in until C++03 where it was deprecated.
1: Some compilers still allow the conversion in C++11 as an extension.

Assigning a const-string to a constant sized char array, what happens in un-used array indices?

Let's say I have:
char name[16] = "123456789abc";
so name[11] == 'c', name[12] == '\0'.
Will name[13] be gibberish/compiler-dependant, or will it reliably be a specific value (such as '\0'?)
When a character array is initialized from a string literal, unused elements are initialized to zero.
Section 8.5.2 has the rule:
An array of narrow character type (3.9.1), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow string literal, char16_t string literal, char32_t string literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces (2.14.5). Successive characters of the value of the string literal initialize the elements of the array.
There shall not be more initializers than there are array elements.
If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized (8.5).
Therefore, they will be zero. Guaranteed.
And accessing them is not undefined behavior.
If you initialized from a list of characters instead char name[16] = { '1', '2', '3', '4', '5', 0 };, you'd be in the realm of aggregate initialization, which gives the same result through a different route.
When aggregate initialization is used and there are fewer initializers than elements of the aggregate, the remainder are value initialized (unless there is a brace-or-equal-initializer in the definition of the aggregate type).
The rule is found in section 8.5.1
An initializer-list is ill-formed if the number of initializer-clauses exceeds the number of members or elements to initialize.
If there are fewer initializer-clauses in the list than there are members in the aggregate, then each member not explicitly initialized shall be initialized from its brace-or-equal-initializer or, if there is no brace-or-equal-initializer, from an empty initializer list (8.5.4).
And there is an example given:
struct S { int a; const char* b; int c; int d = b[a]; };
S ss = { 1, "asdf" };
initializes ss.a with 1, ss.b with "asdf", ss.c with the value of an expression of the form int{} (that
is, 0), and ss.d with the value of ss.b[ss.a] (that is, 's')
C++03 didn't explicitly state that extra elements would be zero-initialized in the character array rule. On the other hand the aggregate rule was substantially similar and did guarantee value initialization, always (brace-or-equal-initializer was introduced in C++11).
C99 section 6.7.8 provides zero initialization in both cases, to wit:
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
Of course, objects with static storage duration are pre-initialized to zero.

C/C++ int[] vs int* (pointers vs. array notation). What is the difference?

I know that arrays in C are just pointers to sequentially stored data. But what differences imply the difference in notation [] and *. I mean in ALL possible usage context.
For example:
char c[] = "test";
if you provide this instruction in a function body it will allocate the string on a stack while
char* c = "test";
will point to a data (readonly) segment.
Can you list all the differences between these two notations in ALL usage contexts to form a clear general view.
According to the C99 standard:
An array type describes a contiguously allocated nonempty set of
objects with a particular member object type, called the element
type.
Array types are characterized by their element type and by
the number of elements in the array. An array type is said to be
derived from its element type, and if its element type is T, the array
type is sometimes called array of T. The construction of an array
type from an element type is called array type derivation.
A pointer type may be derived from a function type, an object type, or
an incomplete type, called the referenced type. A pointer type
describes an object whose value provides a reference to an entity of
the referenced type. A pointer type derived from the referenced type T
is sometimes referred to as a pointer to T. The construction of a pointer
type from a referenced type is called pointer type derivation.
According to the standard declarations…
char s[] = "abc", t[3] = "abc";
char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' };
…are identical. The contents of the arrays are modifiable. On the other hand, the declaration…
const char *p = "abc";
…defines p with the type as pointer to constant char and initializes it to point to an object with type constant array of char (in C++) with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
According to 6.3.2.1 Array subscripting dereferencing and array subscripting are identical:
The definition of the subscript operator [] is that E1[E2] is
identical to (*((E1)+(E2))).
The differences of arrays vs. pointers are:
pointer has no information of the memory size behind it (there is no portable way to get it)
an array of incomplete type cannot be constructed
a pointer type may be derived from a an incomplete type
a pointer can define a recursive structure (this one is the consequence of the previous two)
More helpful information on the subject can be found at http://www.cplusplus.com/forum/articles/9/
char c[] = "test";
This will create an array containing the string test so you can modify/change any character, say
c[2] = 'p';
but,
char * c = "test"
It is a string literal -- it's a const char.
So doing any modification to this string literal gives us segfault. So
c[2] = 'p';
is illegal now and gives us segfault.
char [] denotes the type "array of unknown bound of char", while char * denotes the type "pointer to char". As you've observed, when a definition of a variable of type "array of unknown bound of char" is initialised with a string literal, the type is converted to "array[N] of char" where N is the appropriate size. The same applies in general to initialisation from array aggregate:
int arr[] = { 0, 1, 2 };
arr is converted to type "array[3] of int".
In a user-defined type definition (struct, class or union), array-of-unknown-bound types are prohibited in C++, although in some versions of C they are allowed as the last member of a struct, where they can be used to access allocated memory past the end of the struct; this usage is called "flexible arrays".
Recursive type construction is another difference; one can construct pointers to and arrays of char * (e.g. char **, char (*)[10]) but this is illegal for arrays of unknown bound; one cannot write char []* or char [][10] (although char (*)[] and char [10][] are fine).
Finally, cv-qualification operates differently; given typedef char *ptr_to_char and typedef char array_of_unknown_bound_of_char[], cv-qualifiying the pointer version will behave as expected, while cv-qualifying the array version will migrate the cv-qualification to the element type: that is, const array_of_unknown_bound_of_char is equivalent to const char [] and not the fictional char (const) []. This means that in a function definition, where array-to-pointer decay operates on the arguments prior to constructing the prototype,
void foo (int const a[]) {
a = 0;
}
is legal; there is no way to make the array-of-unknown-bound parameter non-modifiable.
The whole lot becomes clear if you know that declaring a pointer variable does not create the type of variable, it points at. It creates a pointer variable.
So, in practice, if you need a string then you need to specify an array of characters and a pointer can be used later on.
Actually arrays are equivalent to constant pointers.
Also, char c[] allocates memory for the array, whose base address is c itself. No separate memory is allocated for storing that address.
Writing char *c allocates memory for the string whose base address is stored in c. Also, a separate memory location is used to store c.

Can a string literal be subscripted in a constant expression?

This is valid, because a constexpr expression is allowed to take the value of "a glvalue of literal type that refers to a non-volatile object defined with constexpr, or that refers to a sub-object of such an object" (§5.19/2):
constexpr char str[] = "hello, world";
constexpr char e = str[1];
However, it would seem that string literals do not fit this description:
constexpr char e = "hello, world"[1]; // error: literal is not constexpr
2.14.5/8 describes the type of string literals:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration.
It would seem that an object of this type could be indexed, if only it were temporary and not of static storage duration (5.19/2, right after the above snippet):
[constexpr allows lvalue-to-rvalue conversion of] … a glvalue of literal type that refers to a non-volatile temporary object whose lifetime has not ended, initialized with a constant expression
This is particularly odd since taking the lvalue of a temporary object is usually "cheating." I suppose this rule applies to function arguments of reference type, such as in
constexpr char get_1( char const (&str)[ 6 ] )
{ return str[ 1 ]; }
constexpr char i = get_1( { 'y', 'i', 'k', 'e', 's', '\0' } ); // OK
constexpr char e = get_1( "hello" ); // error: string literal not temporary
For what it's worth, GCC 4.7 accepts get_1( "hello" ), but rejects "hello"[1] because "the value of ‘._0’ is not usable in a constant expression"… yet "hello"[1] is acceptable as a case label or an array bound.
I'm splitting some Standardese hairs here… is the analysis correct, and was there some design intent for this feature?
EDIT: Oh… there is some motivation for this. It seems that this sort of expression is the only way to use a lookup table in the preprocessor. For example, this introduces a block of code which is ignored unless SOME_INTEGER_FLAG is 1 or 5, and causes a diagnostic if greater than 6:
#if "\0\1\0\0\0\1"[ SOME_INTEGER_FLAG ]
This construct would be new to C++11.
The intent is that this works and the paragraphs that state when an lvalue to rvalue conversion is valid will be amended with a note that states that an lvalue that refers to a subobject of a string literal is a constant integer object initialized with a constant expression (which is described as one of the allowed cases) in a post-C++11 draft.
Your comment about the use within the preprocessor looks interesting but I'm unsure whether that is intended to work. I hear about this the first time at all.
Regarding your question about #if, it was not the intent of the standards committee to increase the set of expressions which can be used in the preprocessor, and the current wording is considered to be a defect. This will be listed as core issue 1436 in the post-Kona WG21 mailing. Thanks for bringing this to our attention!