I have a question about the array name a
int a[10]
How is the array name defined in C++? A constant pointer? It is defined like this or just we can look it like this? What operations can be applied on the name?
The C++ standard defines what an array is and its behaviour. Take a look in the index. It's not a pointer, const or otherwise, and it's not anything else, it's an array.
To see a difference:
int a[10];
int *const b = a;
std::cout << sizeof(a); // prints "40" on my machine.
std::cout << sizeof(b); // prints "4" on my machine.
Clearly a and b are not the same type, since they have different sizes.
In most contexts, an array name "decays" to a pointer to its own first element. You can think of this as an automatic conversion. The result is an rvalue, meaning that it's "just" a pointer value, and can't be assigned to, similar to when a function name decays to a function pointer. Doesn't mean it's "const" as such, but it's not assignable.
So an array "is" a pointer much like a function "is" a function pointer, or a long "is" an int. That is to say, it isn't really, but you can use it as one in most contexts thanks to the conversion.
An array name is not a constant pointer - however it acts like one in so many contexts (it converts to one on sight pretty much) that for most purposes it is.
From 6.3.2.1/3 "Other operands/Lvalues, arrays,and function designators":
Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.
Related
Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.
I read a question earlier that was closed due to being an exact duplicate of this
When a function has a specific-size array parameter, why is it replaced with a pointer?
and
How to find the 'sizeof' (a pointer pointing to an array)?
but after reading this I am still confused by how sizeof() works. I understand that passing an array as an argument to a function such as
void foo(int a[5])
will result in the array argument decaying to a pointer. What I did not find in the above 2 question links was a clear answer as to why it is that the sizeof() function itself is exempt from (or at least seemingly exempt from) this pointer decay behaviour. If sizeof() behaved like any other function then
int a[5] = {1,2,3,4,5};
cout << sizeof(a) << endl;
then the above should output 4 instead of 20. Have I missed something obvious as this seems to be a contradiction of the decay to pointer behaviour??? Sorry for bringing this up again but I really am having a hard time of understanding why this happens despite having happily used the function for years without really thinking about it.
Because the standard says so (emphasis mine):
(C99, 6.3.2.1p3) "Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue."
Note that for C++, the standard explicitly says the size is the size of the array:
(C++11, 5.3.3p2 sizeof) "[...] When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n
elements is n times the size of an element."
sizeof is an operator, not a function. It's a specific one at that, too. The parentheses aren't even necessary if it's an expression:
int a;
sizeof (int); //needed because `int` is a type
sizeof a; //optional because `a` is an expression
sizeof (a); //^ also works
As you can see, it's on this chart of precedence as well. It's also one of the non-overloadable operators.
I was asked to give the output of the following code in an interview.
int a[] = {1,2,3,4,5};
int *p = &a + 1;
printf("%d, %d", *(a+1), *(p - 1));
I said I could not determine the result of the second one, so I failed the interview.
When I got back to home, and tried to compile the code, g++ will report an error, but gcc will only give a warning. The result printed is '2,5'.
Anyone knows why the C and C++ compiler behave differently on this?
a is an array of integers, which converts to a pointer to the first element when needed. a+1 invokes that conversion, and gives a pointer to the second element.
&a is a pointer to the array itself, not to the first element of it, so &a + 1 points beyond the end of the array (to the point where the second array would be, if it were a 2-dimensional array).
The code then converts that pointer (of type int (*)[5]) to a pointer to an integer (type int*). C++ doesn't allow such a conversion without an explicit reinterpret_cast, while C is more lenient in the pointer conversions it allows.
Finally (assuming that p and p2 are supposed to be the same thing), p - 1 points to the last element of a.
Well, a is array of 5 ints, so &a is a pointer to array of 5 ints. in C++, you can't assign that address in a int* without cast it. gcc (in C language) gives only warning, but I think that is not valid C.
For the code:
&a+1 is the next array after a, meaning, the address of the 6th element in a, so p-1 is the address of the 5th element of a.
(I'm not sure if &a+1 is legal. it's the address of the element that is after the array, which is usually legal, but since &a is not an array, it may be illegal.)
int a[] = {1,2,3,4,5};
int *p = &a + 1;
This is invalid C code.
The expression &a + 1 is of type int (*)[5]. You cannot assign an expression of type int (*)[5] to an int *.
Except with the generic object pointer type void *, there is no implicit conversion between object pointers. A cast is required to initialize p with &a + 1 value.
Where does C says this declaration is invalid?
int *p = &a + 1;
in the constraints of the assignment operator:
(C99, 6.5.16.1p1) "both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right"
And an int and an array type are not compatible types (see 6.2.7p1 for more on type compatibility).
well also, it is an initialization not an assignment, but the same constraint applies:
(C99, 6.7.8p11) "The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type."
The C++ compiler gave an error because &a evaluates to int(*)[5], which is a pointer to 5 integers, and you're attempting to assign it to an int*.
If you did fix the type, this code would print the second item in the array, followed by the address of the array.
My guess is that C decays the array into a pointer, resulting in &a evaluating to int**, and allows that to be assigned to int*. However, I don't do much C, so someone else may know better.
void DoSomeThing(CHAR parm[])
{
}
int main()
{
DoSomeThing(NULL);
}
Is the passing NULL for array parameter allowed in C/C++?
What you cannot do is pass an array by value. The signature of the function
void f( char array[] );
is transformed into:
void f( char *array );
and you can always pass NULL as argument to a function taking a pointer.
You can however pass an array by reference in C++, and in that case you will not be able to pass a NULL pointer (or any other pointer):
void g( char (&array)[ 10 ] );
Note that the size of the array is part of the type, and thus part of the signature, which means that g will only accept lvalue-expressions of type array of 10 characters
Short answer is yes, you can pass NULL in this instance (at least for C, and I think the same is true for C++).
There are two reasons for this. First, in the context of a function parameter declaration, the declarations T a[] and T a[N] are synonymous with T *a; IOW, despite the array notation, a is declared as a pointer to T, rather than an array of T. From the C language standard:
6.7.5.3 Function declarators (including prototypes)
...
7 A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation. If the keyword static also appears within the [ and ] of the
array type derivation, then for each call to the function, the value of the corresponding
actual argument shall provide access to the first element of an array with at least as many
elements as specified by the size expression.
The second reason is that when an expression of array type appears in most contexts (such as in a function call), the type of that expression is implicitly converted ("decays") to a pointer type, so what actually gets passed to the function is a pointer value, not an array:
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is
converted to an expression with type ‘‘pointer to type’’ that points to the initial element of
the array object and is not an lvalue. If the array object has register storage class, the
behavior is undefined.
First off, what you're passing to the function is a pointer and not an array. The array notation used for the parm parameter is just syntactic sugar for CHAR *parm.
So yes, you can pass a NULL pointer.
The only issue would be if DoSomeThing were overloaded, in which case the compiler might not be able to distinguish which function to call. But casting to the appropriate type would theoretically fix that.
I recently embarrassed myself while explaining to a colleague why
char a[100];
scanf("%s", &a); // notice a & in front of 'a'
is very bad and that the slightly better way to do it is:
char a[100];
scanf("%s", a); // notice no & in front of 'a'
Ok. For everybody getting ready to tell me why scanf should not be used anyway for security reasons: ease up. This question is actually about the meaning of "&a" vs "a".
The thing is, after I explained why it shouldn't work, we tried it (with gcc) and it works =)). I ran a quick
printf("%p %p", a, &a);
and it prints the same address twice.
Can anybody explain to me what's going on?
Well, the &a case should be obvious. You take the address of the array, exactly as expected.
a is a bit more subtle, but the answer is that a is the array. And as any C programmer knows, arrays have a tendency to degenerate into a pointer at the slightest provocation, for example when passing it as a function parameter.
So scanf("%s", a) expects a pointer, not an array, so the array degenerates into a pointer to the first element of the array.
Of course scanf("%s", &a) works too, because that's explicitly the address of the array.
Edit: Oops, looks like I totally failed to consider what argument types scanf actually expects. Both cases yield a pointer to the same address, but of different types. (pointer to char, versus pointer to array of chars).
And I'll gladly admit I don't know enough about the semantics for ellipsis (...), which I've always avoided like the plague, so looks like the conversion to whichever type scanf ends up using may be undefined behavior. Read the comments, and litb's answer. You can usually trust him to get this stuff right. ;)
Well, scanf expects a char* pointer as the next argument when seeing a "%s". But what you give it is a pointer to a char[100]. You give it a char(*)[100]. It's not guaranteed to work at all, because the compiler may use a different representation for array pointers of course. If you turn on warnings for gcc, you will see also the proper warning displayed.
When you provide an argument object that is an argument not having a listed parameter in the function (so, as in the case for scanf when has the vararg style "..." arguments after the format string), the array will degenerate to a pointer to its first element. That is, the compiler will create a char* and pass that to printf.
So, never do it with &a and pass it to scanf using "%s". Good compilers, as comeau, will warn you correctly:
warning: argument is incompatible with corresponding format string conversion
Of course, the &a and (char*)a have the same address stored. But that does not mean you can use &a and (char*)a interchangeably.
Some Standard quotes to especially show how pointer arguments are not converted to void* auto-magically, and how the whole thing is undefined behavior.
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object. (6.3.2.1/3)
So, that is done always - it isn't mentioned below explicitly anymore when listening valid cases when types may differ.
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments. (6.5.2.2/7)
About how va_arg behaves extracting the arguments passed to printf, which is a vararg function, emphasis added by me (7.15.1.1/2):
Each invocation of the va_arg macro modifies ap so that the
values of successive arguments are returned in turn. The parameter type shall be a type
name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
one type is a signed integer type, the other type is the corresponding unsigned integer
type, and the value is representable in both types;
one type is pointer to void and the other is a pointer to a character type.
Well, here is what that default argument promotion is:
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions. (6.5.2.2/6)
It's been a while since I programmed in C but here's my 2c:
char a[100] doesn't allocate a separate variable for the address of the array, so the memory allocation looks like this:
---+-----+---
...|0..99|...
---+-----+---
^
a == &a
For comparison, if the array was malloc'd then there is a separate variable for the pointer, and a != &a.
char *a;
a = malloc(100);
In this case the memory looks like this:
---+---+---+-----+---
...| a |...|0..99|...
---+---+---+-----+---
^ ^
&a != a
K&R 2nd Ed. p.99 describes it fairly well:
The correspondence between indexing
and pointer arithmetic is very close.
By definition, the value of a variable
or expression of type array is the
address of element zero of the array.
Thus after the assignment pa=&a[0];
pa and a have identical values. Since
the name of the array is a synonym for
the location of the initial element,
the assignment pa=&a[0] can also be
written as pa=a;
A C array can be implicitly converted to a pointer to its first element (C99:TC3 6.3.2.1 §3), ie there are a lot of cases where a (which has type char [100]) will behave the same way as &a[0] (which has type char *). This explains why passing a as argument will work.
But don't start thinking this will always be the case: There are important differences between arrays and pointers, eg regarding assignment, sizeof and whatever else I can't think of right now...
&a is actually one of these pitfalls: This will create a pointer to the array, ie it has type char (*) [100] (and not char **). This means &a and &a[0] will point to the same memory location, but will have different types.
As far as I know, there is no implicit conversion between these types and they are not guaranteed to have a compatible representation as well. All I could find is C99:TC3 6.2.5 §27, which doesn't says much about pointers to arrays:
[...] Pointers to other types need not have the same representation or alignment requirements.
But there's also 6.3.2.3 §7:
[...] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
So the cast (char *)&a should work as expected. Actually, I'm assuming here that the lowest addressed byte of an array will be the lowest addressed byte of its first element - not sure if this is guaranteed, or if a compiler is free to add arbitrary padding in front of an array, but if so, that would be seriously weird...
Anyway for this to work, &a still has to be cast to char * (or void * - the standard guarantees that these types have compatible representations). The problem is that there won't be any conversions applied to variable arguments aside from the default argument promotion, ie you have to do the cast explicitly yourself.
To summarize:
&a is of type char (*) [100], which might have a different bit-representation than char *. Therefore, an explicit cast must be done by the programmer, because for variable arguments, the compiler can't know to what it should convert the value. This means only the default argument promotion will be done, which, as litb pointed out, does not include a conversion to void *. It follows that:
scanf("%s", a); - good
scanf("%s", &a); - bad
scanf("%s", (char *)&a); - should be ok
Sorry, a tiny bit off topic:
This reminded me of an article I read about 8 years ago when I was coding C full time. I can't find the article but I think it was titled "arrays are not pointers" or something like that. Anyway, I did come across this C arrays and pointers FAQ which is interesting reading.
char [100] is a complex type of 100 adjacent char's, whose sizeof equals to 100.
Being casted to a pointer ((void*) a), this variable yields the address of the first char.
Reference to the variable of this type (&a) yields address of the whole variable, which, in turn, also happens to be the address of the first char