Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.
Related
What is array to pointer decay? Is there any relation to array pointers?
It's said that arrays "decay" into pointers. A C++ array declared as int numbers [5] cannot be re-pointed, i.e. you can't say numbers = 0x5a5aff23. More importantly the term decay signifies loss of type and dimension; numbers decay into int* by losing the dimension information (count 5) and the type is not int [5] any more. Look here for cases where the decay doesn't happen.
If you're passing an array by value, what you're really doing is copying a pointer - a pointer to the array's first element is copied to the parameter (whose type should also be a pointer the array element's type). This works due to array's decaying nature; once decayed, sizeof no longer gives the complete array's size, because it essentially becomes a pointer. This is why it's preferred (among other reasons) to pass by reference or pointer.
Three ways to pass in an array1:
void by_value(const T* array) // const T array[] means the same
void by_pointer(const T (*array)[U])
void by_reference(const T (&array)[U])
The last two will give proper sizeof info, while the first one won't since the array argument has decayed to be assigned to the parameter.
1 The constant U should be known at compile-time.
Arrays are basically the same as pointers in C/C++, but not quite. Once you convert an array:
const int a[] = { 2, 3, 5, 7, 11 };
into a pointer (which works without casting, and therefore can happen unexpectedly in some cases):
const int* p = a;
you lose the ability of the sizeof operator to count elements in the array:
assert( sizeof(p) != sizeof(a) ); // sizes are not equal
This lost ability is referred to as "decay".
For more details, check out this article about array decay.
Here's what the standard says (C99 6.3.2.1/3 - Other operands - Lvalues, arrays, and function designators):
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is
converted to an expression with type ‘‘pointer to type’’ that points to the initial element of
the array object and is not an lvalue.
This means that pretty much anytime the array name is used in an expression, it is automatically converted to a pointer to the 1st item in the array.
Note that function names act in a similar way, but function pointers are used far less and in a much more specialized way that it doesn't cause nearly as much confusion as the automatic conversion of array names to pointers.
The C++ standard (4.2 Array-to-pointer conversion) loosens the conversion requirement to (emphasis mine):
An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to an rvalue
of type “pointer to T.”
So the conversion doesn't have to happen like it pretty much always does in C (this lets functions overload or templates match on the array type).
This is also why in C you should avoid using array parameters in function prototypes/definitions (in my opinion - I'm not sure if there's any general agreement). They cause confusion and are a fiction anyway - use pointer parameters and the confusion might not go away entirely, but at least the parameter declaration isn't lying.
"Decay" refers to the implicit conversion of an expression from an array type to a pointer type. In most contexts, when the compiler sees an array expression it converts the type of the expression from "N-element array of T" to "pointer to T" and sets the value of the expression to the address of the first element of the array. The exceptions to this rule are when an array is an operand of either the sizeof or & operators, or the array is a string literal being used as an initializer in a declaration.
Assume the following code:
char a[80];
strcpy(a, "This is a test");
The expression a is of type "80-element array of char" and the expression "This is a test" is of type "15-element array of char" (in C; in C++ string literals are arrays of const char). However, in the call to strcpy(), neither expression is an operand of sizeof or &, so their types are implicitly converted to "pointer to char", and their values are set to the address of the first element in each. What strcpy() receives are not arrays, but pointers, as seen in its prototype:
char *strcpy(char *dest, const char *src);
This is not the same thing as an array pointer. For example:
char a[80];
char *ptr_to_first_element = a;
char (*ptr_to_array)[80] = &a;
Both ptr_to_first_element and ptr_to_array have the same value; the base address of a. However, they are different types and are treated differently, as shown below:
a[i] == ptr_to_first_element[i] == (*ptr_to_array)[i] != *ptr_to_array[i] != ptr_to_array[i]
Remember that the expression a[i] is interpreted as *(a+i) (which only works if the array type is converted to a pointer type), so both a[i] and ptr_to_first_element[i] work the same. The expression (*ptr_to_array)[i] is interpreted as *(*a+i). The expressions *ptr_to_array[i] and ptr_to_array[i] may lead to compiler warnings or errors depending on the context; they'll definitely do the wrong thing if you're expecting them to evaluate to a[i].
sizeof a == sizeof *ptr_to_array == 80
Again, when an array is an operand of sizeof, it's not converted to a pointer type.
sizeof *ptr_to_first_element == sizeof (char) == 1
sizeof ptr_to_first_element == sizeof (char *) == whatever the pointer size
is on your platform
ptr_to_first_element is a simple pointer to char.
Arrays, in C, have no value.
Wherever the value of an object is expected but the object is an array, the address of its first element is used instead, with type pointer to (type of array elements).
In a function, all parameters are passed by value (arrays are no exception). When you pass an array in a function it "decays into a pointer" (sic); when you compare an array to something else, again it "decays into a pointer" (sic); ...
void foo(int arr[]);
Function foo expects the value of an array. But, in C, arrays have no value! So foo gets instead the address of the first element of the array.
int arr[5];
int *ip = &(arr[1]);
if (arr == ip) { /* something; */ }
In the comparison above, arr has no value, so it becomes a pointer. It becomes a pointer to int. That pointer can be compared with the variable ip.
In the array indexing syntax you are used to seeing, again, the arr is 'decayed to a pointer'
arr[42];
/* same as *(arr + 42); */
/* same as *(&(arr[0]) + 42); */
The only times an array doesn't decay into a pointer are when it is the operand of the sizeof operator, or the & operator (the 'address of' operator), or as a string literal used to initialize a character array.
It's when array rots and is being pointed at ;-)
Actually, it's just that if you want to pass an array somewhere, but the pointer is passed instead (because who the hell would pass the whole array for you), people say that poor array decayed to pointer.
Array decaying means that, when an array is passed as a parameter to a function, it's treated identically to ("decays to") a pointer.
void do_something(int *array) {
// We don't know how big array is here, because it's decayed to a pointer.
printf("%i\n", sizeof(array)); // always prints 4 on a 32-bit machine
}
int main (int argc, char **argv) {
int a[10];
int b[20];
int *c;
printf("%zu\n", sizeof(a)); //prints 40 on a 32-bit machine
printf("%zu\n", sizeof(b)); //prints 80 on a 32-bit machine
printf("%zu\n", sizeof(c)); //prints 4 on a 32-bit machine
do_something(a);
do_something(b);
do_something(c);
}
There are two complications or exceptions to the above.
First, when dealing with multidimensional arrays in C and C++, only the first dimension is lost. This is because arrays are layed out contiguously in memory, so the compiler must know all but the first dimension to be able to calculate offsets into that block of memory.
void do_something(int array[][10])
{
// We don't know how big the first dimension is.
}
int main(int argc, char *argv[]) {
int a[5][10];
int b[20][10];
do_something(a);
do_something(b);
return 0;
}
Second, in C++, you can use templates to deduce the size of arrays. Microsoft uses this for the C++ versions of Secure CRT functions like strcpy_s, and you can use a similar trick to reliably get the number of elements in an array.
tl;dr: When you use an array you've defined, you'll actually be using a pointer to its first element.
Thus:
When you write arr[idx] you're really just saying *(arr + idx).
functions never really take arrays as parameters, only pointers - either directly, when you specify an array parameter, or indirectly, if you pass a reference to an array.
Sort-of exceptions to this rule:
You can pass fixed-length arrays to functions within a struct.
sizeof() gives the size taken up by the array, not the size of a pointer.
Try this code
void f(double a[10]) {
printf("in function: %d", sizeof(a));
printf("pointer size: %d\n", sizeof(double *));
}
int main() {
double a[10];
printf("in main: %d", sizeof(a));
f(a);
}
and you will see that the size of the array inside the function is not equal to the size of the array in main, but it is equal to the size of a pointer.
You probably heard that "arrays are pointers", but, this is not exactly true (the sizeof inside main prints the correct size). However, when passed, the array decays to pointer. That is, regardless of what the syntax shows, you actually pass a pointer, and the function actually receives a pointer.
In this case, the definition void f(double a[10] is implicitly transformed by the compiler to void f(double *a). You could have equivalently declared the function argument directly as *a. You could have even written a[100] or a[1], instead of a[10], since it is never actually compiled that way (however, you shouldn't do it obviously, it would confuse the reader).
Arrays are automatically passed by pointer in C. The rationale behind it can only be speculated.
int a[5], int *a and int (*a)[5] are all glorified addresses meaning that the compiler treats arithmetic and deference operators on them differently depending on the type, so when they refer to the same address they are not treated the same by the compiler. int a[5] is different to the other 2 in that the address is implicit and does not manifest on the stack or the executable as part of the array itself, it is only used by the compiler to resolve certain arithmetic operations, like taking its address or pointer arithmetic. int a[5] is therefore an array as well as an implicit address, but as soon as you talk about the address itself and place it on the stack, the address itself is no longer an array, and can only be a pointer to an array or a decayed array i.e. a pointer to the first member of the array.
For instance, on int (*a)[5], the first dereference on a will produce an int * (so the same address, just a different type, and note not int a[5]), and pointer arithmetic on a i.e. a+1 or *(a+1) will be in terms of the size of an array of 5 ints (which is the data type it points to), and the second dereference will produce the int. On int a[5] however, the first dereference will produce the int and the pointer arithmetic will be in terms of the size of an int.
To a function, you can only pass int * and int (*)[5], and the function casts it to whatever the parameter type is, so within the function you have a choice whether to treat an address that is being passed as a decayed array or a pointer to an array (where the function has to specify the size of the array being passed). If you pass a to a function and a is defined int a[5], then as a resolves to an address, you are passing an address, and an address can only be a pointer type. In the function, the parameter it accesses is then an address on the stack or in a register, which can only be a pointer type and not an array type -- this is because it's an actual address on the stack and is therefore clearly not the array itself.
You lose the size of the array because the type of the parameter, being an address, is a pointer and not an array, which does not have an array size, as can be seen when using sizeof, which works on the type of the value being passed to it. The parameter type int a[5] instead of int *a is allowed but is treated as int * instead of disallowing it outright, though it should be disallowed, because it is misleading, because it makes you think that the size information can be used, but you can only do this by casting it to int (*a)[5], and of course, the function has to specify the size of the array because there is no way to pass the size of the array because the size of the array needs to be a compile-time constant.
I might be so bold to think there are four (4) ways to pass an array as the function argument. Also here is the short but working code for your perusal.
#include <iostream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;
// test data
// notice native array init with no copy aka "="
// not possible in C
const char* specimen[]{ __TIME__, __DATE__, __TIMESTAMP__ };
// ONE
// simple, dangerous and useless
template<typename T>
void as_pointer(const T* array) {
// a pointer
assert(array != nullptr);
} ;
// TWO
// for above const T array[] means the same
// but and also , minimum array size indication might be given too
// this also does not stop the array decay into T *
// thus size information is lost
template<typename T>
void by_value_no_size(const T array[0xFF]) {
// decayed to a pointer
assert( array != nullptr );
}
// THREE
// size information is preserved
// but pointer is asked for
template<typename T, size_t N>
void pointer_to_array(const T (*array)[N])
{
// dealing with native pointer
assert( array != nullptr );
}
// FOUR
// no C equivalent
// array by reference
// size is preserved
template<typename T, size_t N>
void reference_to_array(const T (&array)[N])
{
// array is not a pointer here
// it is (almost) a container
// most of the std:: lib algorithms
// do work on array reference, for example
// range for requires std::begin() and std::end()
// on the type passed as range to iterate over
for (auto && elem : array )
{
cout << endl << elem ;
}
}
int main()
{
// ONE
as_pointer(specimen);
// TWO
by_value_no_size(specimen);
// THREE
pointer_to_array(&specimen);
// FOUR
reference_to_array( specimen ) ;
}
I might also think this shows the superiority of C++ vs C. At least in reference (pun intended) of passing an array by reference.
Of course there are extremely strict projects with no heap allocation, no exceptions and no std:: lib. C++ native array handling is mission critical language feature, one might say.
What is array to pointer decay? Is there any relation to array pointers?
It's said that arrays "decay" into pointers. A C++ array declared as int numbers [5] cannot be re-pointed, i.e. you can't say numbers = 0x5a5aff23. More importantly the term decay signifies loss of type and dimension; numbers decay into int* by losing the dimension information (count 5) and the type is not int [5] any more. Look here for cases where the decay doesn't happen.
If you're passing an array by value, what you're really doing is copying a pointer - a pointer to the array's first element is copied to the parameter (whose type should also be a pointer the array element's type). This works due to array's decaying nature; once decayed, sizeof no longer gives the complete array's size, because it essentially becomes a pointer. This is why it's preferred (among other reasons) to pass by reference or pointer.
Three ways to pass in an array1:
void by_value(const T* array) // const T array[] means the same
void by_pointer(const T (*array)[U])
void by_reference(const T (&array)[U])
The last two will give proper sizeof info, while the first one won't since the array argument has decayed to be assigned to the parameter.
1 The constant U should be known at compile-time.
Arrays are basically the same as pointers in C/C++, but not quite. Once you convert an array:
const int a[] = { 2, 3, 5, 7, 11 };
into a pointer (which works without casting, and therefore can happen unexpectedly in some cases):
const int* p = a;
you lose the ability of the sizeof operator to count elements in the array:
assert( sizeof(p) != sizeof(a) ); // sizes are not equal
This lost ability is referred to as "decay".
For more details, check out this article about array decay.
Here's what the standard says (C99 6.3.2.1/3 - Other operands - Lvalues, arrays, and function designators):
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is
converted to an expression with type ‘‘pointer to type’’ that points to the initial element of
the array object and is not an lvalue.
This means that pretty much anytime the array name is used in an expression, it is automatically converted to a pointer to the 1st item in the array.
Note that function names act in a similar way, but function pointers are used far less and in a much more specialized way that it doesn't cause nearly as much confusion as the automatic conversion of array names to pointers.
The C++ standard (4.2 Array-to-pointer conversion) loosens the conversion requirement to (emphasis mine):
An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to an rvalue
of type “pointer to T.”
So the conversion doesn't have to happen like it pretty much always does in C (this lets functions overload or templates match on the array type).
This is also why in C you should avoid using array parameters in function prototypes/definitions (in my opinion - I'm not sure if there's any general agreement). They cause confusion and are a fiction anyway - use pointer parameters and the confusion might not go away entirely, but at least the parameter declaration isn't lying.
"Decay" refers to the implicit conversion of an expression from an array type to a pointer type. In most contexts, when the compiler sees an array expression it converts the type of the expression from "N-element array of T" to "pointer to T" and sets the value of the expression to the address of the first element of the array. The exceptions to this rule are when an array is an operand of either the sizeof or & operators, or the array is a string literal being used as an initializer in a declaration.
Assume the following code:
char a[80];
strcpy(a, "This is a test");
The expression a is of type "80-element array of char" and the expression "This is a test" is of type "15-element array of char" (in C; in C++ string literals are arrays of const char). However, in the call to strcpy(), neither expression is an operand of sizeof or &, so their types are implicitly converted to "pointer to char", and their values are set to the address of the first element in each. What strcpy() receives are not arrays, but pointers, as seen in its prototype:
char *strcpy(char *dest, const char *src);
This is not the same thing as an array pointer. For example:
char a[80];
char *ptr_to_first_element = a;
char (*ptr_to_array)[80] = &a;
Both ptr_to_first_element and ptr_to_array have the same value; the base address of a. However, they are different types and are treated differently, as shown below:
a[i] == ptr_to_first_element[i] == (*ptr_to_array)[i] != *ptr_to_array[i] != ptr_to_array[i]
Remember that the expression a[i] is interpreted as *(a+i) (which only works if the array type is converted to a pointer type), so both a[i] and ptr_to_first_element[i] work the same. The expression (*ptr_to_array)[i] is interpreted as *(*a+i). The expressions *ptr_to_array[i] and ptr_to_array[i] may lead to compiler warnings or errors depending on the context; they'll definitely do the wrong thing if you're expecting them to evaluate to a[i].
sizeof a == sizeof *ptr_to_array == 80
Again, when an array is an operand of sizeof, it's not converted to a pointer type.
sizeof *ptr_to_first_element == sizeof (char) == 1
sizeof ptr_to_first_element == sizeof (char *) == whatever the pointer size
is on your platform
ptr_to_first_element is a simple pointer to char.
Arrays, in C, have no value.
Wherever the value of an object is expected but the object is an array, the address of its first element is used instead, with type pointer to (type of array elements).
In a function, all parameters are passed by value (arrays are no exception). When you pass an array in a function it "decays into a pointer" (sic); when you compare an array to something else, again it "decays into a pointer" (sic); ...
void foo(int arr[]);
Function foo expects the value of an array. But, in C, arrays have no value! So foo gets instead the address of the first element of the array.
int arr[5];
int *ip = &(arr[1]);
if (arr == ip) { /* something; */ }
In the comparison above, arr has no value, so it becomes a pointer. It becomes a pointer to int. That pointer can be compared with the variable ip.
In the array indexing syntax you are used to seeing, again, the arr is 'decayed to a pointer'
arr[42];
/* same as *(arr + 42); */
/* same as *(&(arr[0]) + 42); */
The only times an array doesn't decay into a pointer are when it is the operand of the sizeof operator, or the & operator (the 'address of' operator), or as a string literal used to initialize a character array.
It's when array rots and is being pointed at ;-)
Actually, it's just that if you want to pass an array somewhere, but the pointer is passed instead (because who the hell would pass the whole array for you), people say that poor array decayed to pointer.
Array decaying means that, when an array is passed as a parameter to a function, it's treated identically to ("decays to") a pointer.
void do_something(int *array) {
// We don't know how big array is here, because it's decayed to a pointer.
printf("%i\n", sizeof(array)); // always prints 4 on a 32-bit machine
}
int main (int argc, char **argv) {
int a[10];
int b[20];
int *c;
printf("%zu\n", sizeof(a)); //prints 40 on a 32-bit machine
printf("%zu\n", sizeof(b)); //prints 80 on a 32-bit machine
printf("%zu\n", sizeof(c)); //prints 4 on a 32-bit machine
do_something(a);
do_something(b);
do_something(c);
}
There are two complications or exceptions to the above.
First, when dealing with multidimensional arrays in C and C++, only the first dimension is lost. This is because arrays are layed out contiguously in memory, so the compiler must know all but the first dimension to be able to calculate offsets into that block of memory.
void do_something(int array[][10])
{
// We don't know how big the first dimension is.
}
int main(int argc, char *argv[]) {
int a[5][10];
int b[20][10];
do_something(a);
do_something(b);
return 0;
}
Second, in C++, you can use templates to deduce the size of arrays. Microsoft uses this for the C++ versions of Secure CRT functions like strcpy_s, and you can use a similar trick to reliably get the number of elements in an array.
tl;dr: When you use an array you've defined, you'll actually be using a pointer to its first element.
Thus:
When you write arr[idx] you're really just saying *(arr + idx).
functions never really take arrays as parameters, only pointers - either directly, when you specify an array parameter, or indirectly, if you pass a reference to an array.
Sort-of exceptions to this rule:
You can pass fixed-length arrays to functions within a struct.
sizeof() gives the size taken up by the array, not the size of a pointer.
Try this code
void f(double a[10]) {
printf("in function: %d", sizeof(a));
printf("pointer size: %d\n", sizeof(double *));
}
int main() {
double a[10];
printf("in main: %d", sizeof(a));
f(a);
}
and you will see that the size of the array inside the function is not equal to the size of the array in main, but it is equal to the size of a pointer.
You probably heard that "arrays are pointers", but, this is not exactly true (the sizeof inside main prints the correct size). However, when passed, the array decays to pointer. That is, regardless of what the syntax shows, you actually pass a pointer, and the function actually receives a pointer.
In this case, the definition void f(double a[10] is implicitly transformed by the compiler to void f(double *a). You could have equivalently declared the function argument directly as *a. You could have even written a[100] or a[1], instead of a[10], since it is never actually compiled that way (however, you shouldn't do it obviously, it would confuse the reader).
Arrays are automatically passed by pointer in C. The rationale behind it can only be speculated.
int a[5], int *a and int (*a)[5] are all glorified addresses meaning that the compiler treats arithmetic and deference operators on them differently depending on the type, so when they refer to the same address they are not treated the same by the compiler. int a[5] is different to the other 2 in that the address is implicit and does not manifest on the stack or the executable as part of the array itself, it is only used by the compiler to resolve certain arithmetic operations, like taking its address or pointer arithmetic. int a[5] is therefore an array as well as an implicit address, but as soon as you talk about the address itself and place it on the stack, the address itself is no longer an array, and can only be a pointer to an array or a decayed array i.e. a pointer to the first member of the array.
For instance, on int (*a)[5], the first dereference on a will produce an int * (so the same address, just a different type, and note not int a[5]), and pointer arithmetic on a i.e. a+1 or *(a+1) will be in terms of the size of an array of 5 ints (which is the data type it points to), and the second dereference will produce the int. On int a[5] however, the first dereference will produce the int and the pointer arithmetic will be in terms of the size of an int.
To a function, you can only pass int * and int (*)[5], and the function casts it to whatever the parameter type is, so within the function you have a choice whether to treat an address that is being passed as a decayed array or a pointer to an array (where the function has to specify the size of the array being passed). If you pass a to a function and a is defined int a[5], then as a resolves to an address, you are passing an address, and an address can only be a pointer type. In the function, the parameter it accesses is then an address on the stack or in a register, which can only be a pointer type and not an array type -- this is because it's an actual address on the stack and is therefore clearly not the array itself.
You lose the size of the array because the type of the parameter, being an address, is a pointer and not an array, which does not have an array size, as can be seen when using sizeof, which works on the type of the value being passed to it. The parameter type int a[5] instead of int *a is allowed but is treated as int * instead of disallowing it outright, though it should be disallowed, because it is misleading, because it makes you think that the size information can be used, but you can only do this by casting it to int (*a)[5], and of course, the function has to specify the size of the array because there is no way to pass the size of the array because the size of the array needs to be a compile-time constant.
I might be so bold to think there are four (4) ways to pass an array as the function argument. Also here is the short but working code for your perusal.
#include <iostream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;
// test data
// notice native array init with no copy aka "="
// not possible in C
const char* specimen[]{ __TIME__, __DATE__, __TIMESTAMP__ };
// ONE
// simple, dangerous and useless
template<typename T>
void as_pointer(const T* array) {
// a pointer
assert(array != nullptr);
} ;
// TWO
// for above const T array[] means the same
// but and also , minimum array size indication might be given too
// this also does not stop the array decay into T *
// thus size information is lost
template<typename T>
void by_value_no_size(const T array[0xFF]) {
// decayed to a pointer
assert( array != nullptr );
}
// THREE
// size information is preserved
// but pointer is asked for
template<typename T, size_t N>
void pointer_to_array(const T (*array)[N])
{
// dealing with native pointer
assert( array != nullptr );
}
// FOUR
// no C equivalent
// array by reference
// size is preserved
template<typename T, size_t N>
void reference_to_array(const T (&array)[N])
{
// array is not a pointer here
// it is (almost) a container
// most of the std:: lib algorithms
// do work on array reference, for example
// range for requires std::begin() and std::end()
// on the type passed as range to iterate over
for (auto && elem : array )
{
cout << endl << elem ;
}
}
int main()
{
// ONE
as_pointer(specimen);
// TWO
by_value_no_size(specimen);
// THREE
pointer_to_array(&specimen);
// FOUR
reference_to_array( specimen ) ;
}
I might also think this shows the superiority of C++ vs C. At least in reference (pun intended) of passing an array by reference.
Of course there are extremely strict projects with no heap allocation, no exceptions and no std:: lib. C++ native array handling is mission critical language feature, one might say.
I know that arrays in C are just pointers to sequentially stored data. But what differences imply the difference in notation [] and *. I mean in ALL possible usage context.
For example:
char c[] = "test";
if you provide this instruction in a function body it will allocate the string on a stack while
char* c = "test";
will point to a data (readonly) segment.
Can you list all the differences between these two notations in ALL usage contexts to form a clear general view.
According to the C99 standard:
An array type describes a contiguously allocated nonempty set of
objects with a particular member object type, called the element
type.
Array types are characterized by their element type and by
the number of elements in the array. An array type is said to be
derived from its element type, and if its element type is T, the array
type is sometimes called array of T. The construction of an array
type from an element type is called array type derivation.
A pointer type may be derived from a function type, an object type, or
an incomplete type, called the referenced type. A pointer type
describes an object whose value provides a reference to an entity of
the referenced type. A pointer type derived from the referenced type T
is sometimes referred to as a pointer to T. The construction of a pointer
type from a referenced type is called pointer type derivation.
According to the standard declarations…
char s[] = "abc", t[3] = "abc";
char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' };
…are identical. The contents of the arrays are modifiable. On the other hand, the declaration…
const char *p = "abc";
…defines p with the type as pointer to constant char and initializes it to point to an object with type constant array of char (in C++) with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
According to 6.3.2.1 Array subscripting dereferencing and array subscripting are identical:
The definition of the subscript operator [] is that E1[E2] is
identical to (*((E1)+(E2))).
The differences of arrays vs. pointers are:
pointer has no information of the memory size behind it (there is no portable way to get it)
an array of incomplete type cannot be constructed
a pointer type may be derived from a an incomplete type
a pointer can define a recursive structure (this one is the consequence of the previous two)
More helpful information on the subject can be found at http://www.cplusplus.com/forum/articles/9/
char c[] = "test";
This will create an array containing the string test so you can modify/change any character, say
c[2] = 'p';
but,
char * c = "test"
It is a string literal -- it's a const char.
So doing any modification to this string literal gives us segfault. So
c[2] = 'p';
is illegal now and gives us segfault.
char [] denotes the type "array of unknown bound of char", while char * denotes the type "pointer to char". As you've observed, when a definition of a variable of type "array of unknown bound of char" is initialised with a string literal, the type is converted to "array[N] of char" where N is the appropriate size. The same applies in general to initialisation from array aggregate:
int arr[] = { 0, 1, 2 };
arr is converted to type "array[3] of int".
In a user-defined type definition (struct, class or union), array-of-unknown-bound types are prohibited in C++, although in some versions of C they are allowed as the last member of a struct, where they can be used to access allocated memory past the end of the struct; this usage is called "flexible arrays".
Recursive type construction is another difference; one can construct pointers to and arrays of char * (e.g. char **, char (*)[10]) but this is illegal for arrays of unknown bound; one cannot write char []* or char [][10] (although char (*)[] and char [10][] are fine).
Finally, cv-qualification operates differently; given typedef char *ptr_to_char and typedef char array_of_unknown_bound_of_char[], cv-qualifiying the pointer version will behave as expected, while cv-qualifying the array version will migrate the cv-qualification to the element type: that is, const array_of_unknown_bound_of_char is equivalent to const char [] and not the fictional char (const) []. This means that in a function definition, where array-to-pointer decay operates on the arguments prior to constructing the prototype,
void foo (int const a[]) {
a = 0;
}
is legal; there is no way to make the array-of-unknown-bound parameter non-modifiable.
The whole lot becomes clear if you know that declaring a pointer variable does not create the type of variable, it points at. It creates a pointer variable.
So, in practice, if you need a string then you need to specify an array of characters and a pointer can be used later on.
Actually arrays are equivalent to constant pointers.
Also, char c[] allocates memory for the array, whose base address is c itself. No separate memory is allocated for storing that address.
Writing char *c allocates memory for the string whose base address is stored in c. Also, a separate memory location is used to store c.
As the heading says, What is the difference between
char a[] = ?string?; and
char *p = ?string?;
This question was asked to me in interview.
I even dont understand the statement.
char a[] = ?string?
Here what is ? operator? Is it a part of a string or it has some specific meaning?
The ? seems to be a typo, it is not semantically valid. So the answer assumes the ? is a typo and explains what probably the interviewer actually meant to ask.
Both are distinctly different, for a start:
The first creates a pointer.
The second creates an array.
Read on for more detailed explanation:
The Array version:
char a[] = "string";
Creates an array that is large enough to hold the string literal "string", including its NULL terminator. The array string is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.
The pointer version:
char *p = "string";
Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates[Ref 1] use of string literal without the const keyword. So the declaration should be:
const char *p = "string";
Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.
Which version is better and which one shall I use?
Depends on the Usage.
If you do not need to make any changes to the string, use the pointer version.
If you intend to change the data, use the array version.
Note: This is a not C++ but this is C specific.
Note that, use of string literal without the const keyword is perfectly valid in C.
However, modifying a string literal is still an Undefined Behavior in C[Ref 2].
This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
The first one is array the other is pointer.
The array declaration char a[6]; requests that space for six characters be set aside, to be known by the name a. That is, there is a location named a at which six characters can sit. The pointer declaration char *p; on the other hand, requests a place which holds a pointer. The pointer is to be known by the name p, and can point to any char (or contiguous array of chars) anywhere.
The statements
char a[] = "string";
char *p = "string";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+----+
a: | s | t | r | i | n | g | \0 |
+---+---+---+---+---+---+----+
+-----+ +---+---+---+---+---+---+---+
p: | *======> | s | t | r | i | n | g |\0 |
+-----+ +---+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location a, move three elements past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character l, but the compiler gets there differently.
Source: comp.lang.c FAQ list · Question 6.2
char a[] = "string";
This allocates the string on the stack.
char *p = "string";
This creates a pointer on the stack that points to the literal in the data segment of the process.
? is whoever wrote it not knowing what they were doing.
Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc and calloc will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.
In these 4 segements, text segment is the READ ONLY segment and in the all the other three is for READ and WRITE.
char a[] = "string"; - This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g) plus NULL character (\0) at the end.
char *p = "string"; - This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string". This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p just points to that string.
Now a[0] (index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'. This operation is allowed because we have READ WRITE access in stack.
But p[0] = 'x' will leads to crash, because we have only READ access to text segement. Segmentation fault will happen if we do any write on text segment.
But you can change the value of variable p, because its local variable in stack. like below
char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);
This is allowed. Here we are changing the address stored in the pointer variable p to address of the string start(again start is also a read only data in text segement). If you want to modify values present in *p means go for dynamically allocated memory.
char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");
Now p[0] = 'x' operation is allowed, because now we are writing in heap.
char *p = "string"; creates a pointer to read-only memory where string literal "string" is stored. Trying to modify string that p points to leads to undefined behaviour.
char a[] = "string"; creates an array and initializes its content by using string literal "string".
They do differ as to where the memory is stored. Ideally the second one should use const char *.
The first one
char buf[] = "hello";
creates an automatic buffer big enough to hold the characters and copies them in (including the null terminator).
The second one
const char * buf = "hello";
should use const and simply creates a pointer that points at memory usually stored in static space where it is illegal to modify it.
The converse (of the fact you can modify the first safely and not the second) is that it is safe to return the second pointer from a function, but not the first. This is because the second one will remain a valid memory pointer outside the scope of the function, the first will not.
const char * sayHello()
{
const char * buf = "hello";
return buf; // valid
}
const char * sayHelloBroken()
{
char buf[] = "hello";
return buf; // invalid
}
a declares an array of char values -- an array of chars which is terminated.
p declares a pointer, which refers to an immutable, terminated, C string, whose exact storage location is implementation-defined. Note that this should be const-qualified (e.g. const char *p = "string";).
If you print it out using std::cout << "a: " << sizeof(a) << "\np: " << sizeof(p) << std::endl;, you will see differences their sizes (note: values may vary by system):
a: 7
p: 8
Here what is ? operator? Is it a part of a string or it has some specific meaning?
char a[] = ?string?
I assume they were once double quotes "string", which potentially were converted to "smart quotes", then could not be represented as such along the way, and were converted to ?.
C and C++ have very similar Pointer to Array relationships...
I can't speak for the exact memory locations of the two statements you are asking about, but I found they articles interesting and useful for understanding some of the differences between the char Pointer declaration, and a char Array declaration.
For clarity:
C Pointer and Array relationship
C++ Pointer to an Array
I think it's important to remember that an array, in C and C++, is a constant pointer to the first element of the array. And consequently you can perform pointer arithmetic on the array.
char *p = "string"; <--- This is a pointer that points to the first address of a character string.
the following is also possible:
char *p;
char a[] = "string";
p = a;
At this point p now references the first memory address of a (the address of the first element)
and so *p == 's'
*(p++) == 't' and so on. (or *(p+1) == 't')
and the same thing would work for a: *(a++) or *(a+1) would also equal 't'
I have a question about the array name a
int a[10]
How is the array name defined in C++? A constant pointer? It is defined like this or just we can look it like this? What operations can be applied on the name?
The C++ standard defines what an array is and its behaviour. Take a look in the index. It's not a pointer, const or otherwise, and it's not anything else, it's an array.
To see a difference:
int a[10];
int *const b = a;
std::cout << sizeof(a); // prints "40" on my machine.
std::cout << sizeof(b); // prints "4" on my machine.
Clearly a and b are not the same type, since they have different sizes.
In most contexts, an array name "decays" to a pointer to its own first element. You can think of this as an automatic conversion. The result is an rvalue, meaning that it's "just" a pointer value, and can't be assigned to, similar to when a function name decays to a function pointer. Doesn't mean it's "const" as such, but it's not assignable.
So an array "is" a pointer much like a function "is" a function pointer, or a long "is" an int. That is to say, it isn't really, but you can use it as one in most contexts thanks to the conversion.
An array name is not a constant pointer - however it acts like one in so many contexts (it converts to one on sight pretty much) that for most purposes it is.
From 6.3.2.1/3 "Other operands/Lvalues, arrays,and function designators":
Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.