C++ assign to array vs assign to object [duplicate] - c++

I understand that memberwise assignment of arrays is not supported, such that the following will not work:
int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"
I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.
However, the following does work:
struct myStruct { int num[3]; };
struct myStruct struct1 = {{1,2,3}};
struct myStruct struct2;
struct2 = struct1;
The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.
Why is member-wise assignment of arrays supported for structs, but not in general?
edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.
edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.

Here's my take on it:
The Development of the C Language offers some insight in the evolution of the array type in C:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
I'll try to outline the array thing:
C's forerunners B and BCPL had no distinct array type, a declaration like:
auto V[10] (B)
or
let V = vec 10 (BCPL)
would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:
struct {
int inumber;
char name[14];
};
But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.
And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:
auto V[10];
V=V+5;
to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!
/* Example how array assignment void make things even weirder in C/C++,
if we don't want to break existing code.
It's actually better to leave things as they are...
*/
typedef int vec[3];
void f(vec a, vec b)
{
vec x,y;
a=b; // pointer assignment
x=y; // NEW! element-wise assignment
a=x; // pointer assignment
x=a; // NEW! element-wise assignment
}
This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.

Concerning the assignment operators, the C++ standard says the following (C++03 §5.17/1):
There are several assignment operators... all require a modifiable lvalue as their left operand
An array is not a modifiable lvalue.
However, assignment to a class type object is defined specially (§5.17/4):
Assignment to objects of a class is defined by the copy assignment operator.
So, we look to see what the implicitly-declared copy assignment operator for a class does (§12.8/13):
The implicitly-defined copy assignment operator for class X performs memberwise assignment of its subobjects. ... Each subobject is assigned in the manner appropriate to its type:
...
-- if the subobject is an array, each element is assigned, in the manner appropriate to the element type
...
So, for a class type object, arrays are copied correctly. Note that if you provide a user-declared copy assignment operator, you cannot take advantage of this, and you'll have to copy the array element-by-element.
The reasoning is similar in C (C99 §6.5.16/2):
An assignment operator shall have a modifiable lvalue as its left operand.
And §6.3.2.1/1:
A modifiable lvalue is an lvalue that does not have array type... [other constraints follow]
In C, assignment is much simpler than in C++ (§6.5.16.1/2):
In simple assignment (=), the value of the right operand is converted to the type of the
assignment expression and replaces the value stored in the object designated by the left
operand.
For assignment of struct-type objects, the left and right operands must have the same type, so the value of the right operand is simply copied into the left operand.

In this link: http://www2.research.att.com/~bs/bs_faq2.html there's a section on array assignment:
The two fundamental problems with arrays are that
an array doesn't know its own size
the name of an array converts to a pointer to its first element at the slightest provocation
And I think this is the fundamental difference between arrays and structs. An array variable is a low level data element with limited self knowledge. Fundamentally, its a chunk of memory and a way to index into it.
So, the compiler can't tell the difference between int a[10] and int b[20].
Structs, however, do not have the same ambiguity.

I know, everyone who answered are experts in C/C++. But I thought, this is the primary reason.
num2 = num1;
Here you are trying to change the base address of the array, which is not permissible.
and of course,
struct2 = struct1;
Here, object struct1 is assigned to another object.

Another reason no further efforts were made to beef up arrays in C is probably that array assignment would not be that useful. Even though it can be easily achieved in C by wrapping it in a struct (and the struct's address can be simply cast to the array's address or even the array's first element's address for further processing) this feature is rarely used. One reason is that arrays of different sizes are incompatible which limits the benefits of assignment or, related, passing to functions by value.
Most functions with array parameters in languages where arrays are first-class types are written for arrays of arbitrary size. The function then usually iterates over the given number of elements, an information that the array provides. (In C the idiom is, of course, to pass a pointer and a separate element count.) A function which accepts an array of just one specific size is not needed as often, so not much is missed. (This changes when you can leave it to the compiler to generate a separate function for any occurring array size, as with C++ templates; this is the reason why std::array is useful.)

Related

assignment operator in complex type vs arrays in C++ [duplicate]

I understand that memberwise assignment of arrays is not supported, such that the following will not work:
int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"
I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.
However, the following does work:
struct myStruct { int num[3]; };
struct myStruct struct1 = {{1,2,3}};
struct myStruct struct2;
struct2 = struct1;
The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.
Why is member-wise assignment of arrays supported for structs, but not in general?
edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.
edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.
Here's my take on it:
The Development of the C Language offers some insight in the evolution of the array type in C:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
I'll try to outline the array thing:
C's forerunners B and BCPL had no distinct array type, a declaration like:
auto V[10] (B)
or
let V = vec 10 (BCPL)
would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:
struct {
int inumber;
char name[14];
};
But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.
And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:
auto V[10];
V=V+5;
to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!
/* Example how array assignment void make things even weirder in C/C++,
if we don't want to break existing code.
It's actually better to leave things as they are...
*/
typedef int vec[3];
void f(vec a, vec b)
{
vec x,y;
a=b; // pointer assignment
x=y; // NEW! element-wise assignment
a=x; // pointer assignment
x=a; // NEW! element-wise assignment
}
This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.
Concerning the assignment operators, the C++ standard says the following (C++03 §5.17/1):
There are several assignment operators... all require a modifiable lvalue as their left operand
An array is not a modifiable lvalue.
However, assignment to a class type object is defined specially (§5.17/4):
Assignment to objects of a class is defined by the copy assignment operator.
So, we look to see what the implicitly-declared copy assignment operator for a class does (§12.8/13):
The implicitly-defined copy assignment operator for class X performs memberwise assignment of its subobjects. ... Each subobject is assigned in the manner appropriate to its type:
...
-- if the subobject is an array, each element is assigned, in the manner appropriate to the element type
...
So, for a class type object, arrays are copied correctly. Note that if you provide a user-declared copy assignment operator, you cannot take advantage of this, and you'll have to copy the array element-by-element.
The reasoning is similar in C (C99 §6.5.16/2):
An assignment operator shall have a modifiable lvalue as its left operand.
And §6.3.2.1/1:
A modifiable lvalue is an lvalue that does not have array type... [other constraints follow]
In C, assignment is much simpler than in C++ (§6.5.16.1/2):
In simple assignment (=), the value of the right operand is converted to the type of the
assignment expression and replaces the value stored in the object designated by the left
operand.
For assignment of struct-type objects, the left and right operands must have the same type, so the value of the right operand is simply copied into the left operand.
In this link: http://www2.research.att.com/~bs/bs_faq2.html there's a section on array assignment:
The two fundamental problems with arrays are that
an array doesn't know its own size
the name of an array converts to a pointer to its first element at the slightest provocation
And I think this is the fundamental difference between arrays and structs. An array variable is a low level data element with limited self knowledge. Fundamentally, its a chunk of memory and a way to index into it.
So, the compiler can't tell the difference between int a[10] and int b[20].
Structs, however, do not have the same ambiguity.
I know, everyone who answered are experts in C/C++. But I thought, this is the primary reason.
num2 = num1;
Here you are trying to change the base address of the array, which is not permissible.
and of course,
struct2 = struct1;
Here, object struct1 is assigned to another object.
Another reason no further efforts were made to beef up arrays in C is probably that array assignment would not be that useful. Even though it can be easily achieved in C by wrapping it in a struct (and the struct's address can be simply cast to the array's address or even the array's first element's address for further processing) this feature is rarely used. One reason is that arrays of different sizes are incompatible which limits the benefits of assignment or, related, passing to functions by value.
Most functions with array parameters in languages where arrays are first-class types are written for arrays of arbitrary size. The function then usually iterates over the given number of elements, an information that the array provides. (In C the idiom is, of course, to pass a pointer and a separate element count.) A function which accepts an array of just one specific size is not needed as often, so not much is missed. (This changes when you can leave it to the compiler to generate a separate function for any occurring array size, as with C++ templates; this is the reason why std::array is useful.)

Why C++ forbids new T[n](arg...)?

This question seems ancient (since C++98), but a quick search didn't lead me to an answer.
std::size_t n = 100;
std::unique_ptr<int[]> data(new int[n]); // ok, uninitialized
std::unique_ptr<int[]> data(new int[n]()); // ok, value-initialized
std::unique_ptr<int[]> data(new int[n](5)); // not allowed, but why?
What's the rationale behind this restriction, some UDTs cannot be default constructed, so those types cannot be used with new[].
Please don't go astray to suggesting something like std::vector or just say that's how the standard defines it, everyone knows that, but I want to know the reason why new T[n](arg...) is forbidden by the standard.
The first part of the answer to "why is it forbidden" is almost tautological: because it is not allowed by the standard. I know you probably don't like such an answer, but that's the nature of the beast, sorry.
And why should it be allowed anyway? What would it mean? In your very very very simple case, initializing every int with a specific value is fairly reasonable. But then again, for normal (statically allocated) array initialization, the rule is that each element in the right hand side {} is passed to an element of the left hand side array, with extra elements getting default-initialization treatment. Ie,
int data[n] = {5};
would only initialize the first element with 5.
But let's look at another example, which isn't even very contrived, which shows that what you ask for doesn't really make a lot of sense in a general context.
struct Foo {
int a,b,c,d;
Foo(int a=0, int b=0, int c=0, int d=0)
: a(a), b(b), c(c), d(d) {}
};
...
Foo *f = new Foo[4](1,2,3,4); // <-- what does this mean?!?!
Should there be four Foo(1,2,3,4)s? Or [Foo(1,2,3,4), Foo(), Foo(), Foo()]? Or maybe [Foo(1), Foo(2), Foo(3), Foo(4)]? Or why not [Foo(1,2,3), Foo(4), Foo(), Foo()]? What if one of Foo's arguments was rvalue reference or something? There are just soooo many cases in which there is no obvious Right Thing that the compiler should do. Most of the examples I just gave have valid use cases, and there isn't one that's clearly better than the others.
PS: You can achieve what you want with eg
std::vector<int> data(n, 5);
some UDTs don't even have a default ctor, so those types cannot be used with new[]
I'm not sure what you mean by this. E.g. int does not have a default constructor. However, you can initialize it as new int(3) or as new int[n](), as you already know. The event that takes place here is called initialization. Initialization can be carried out by constructors, but that's just a specific kind of initialization applicable to class types only. int is not a class type and constructors are completely inapplicable to int. So, you should not be even mentioning constructors with regard to int.
As for new int[n](5)... What did you expect to happen in this case? C++ does not support such syntax for array initialization. What did you want it to mean? You have n array elements and only one initializer. How are you proposing to initialize n array elements and only one initializer? Use value 5 to initialize each array element? But C++ never had such multi-initialization. Even the modern C++ doesn't.
You seem to have adopted this "multi-initialization" interpretation of new int[n](5) syntax as the one and only "obviously natural" way for it to behave. However, this is not necessarily that clear-cut. Historically C++ language (and C language) followed a different philosophy with regard to initializers that are "smaller" or "shorter" than the aggregate being initialized. Historically the language used the explicitly specified initializers to initialize the sub-objects at the beginning of the aggregate, while the rest of the sub-objects got default-initialized (sticking to C++98 terminology). From this point of view, you can actually see the () initializer in new int[n]() not as your "multi-initializer", but rather as an initializer only for the very first element of the array. Meanwhile, the rest of the elements get default-initialized (producing the same effect as () would). Granted, one can argue that the above logic usually applies to { ... } initializers, not to (...) initializers, but nevertheless this general design principle is present in the language.
It's not clear what int[n](5) would even mean. int[5]{1,2,3,4,5} is perfectly well-defined, however.
I'm going to assume you mean for int[n](...) to construct each array element in the same way with the given arguments. Your use case for such a syntax is for data types without a default constructor, but I posit that you don't actually solve that use case: that for many (most?) arrays of such types, each object needs to be constructed differently.
The original expectation is to allow new T[n](arg…) to call T(arg…) to initialize each element.
It turns out that people don't even agree on what new T[n](arg…) would mean.
I gather some good points from the ansnwers and comments, here's the summary:
Inconsistent meaning. parenthesized initializer is used to initialize the object, in case of an array, the only viable one is () which default initializes the array and its elements. Give T[n](arg…) a new meaning will conflict with the current meaning of parenthesized initializer.
No general way to channel the args. Consider a type T with ctor T(int, Ref&, Rval&&), and the usage new T[n](++i, ref, Rval{}). If the args is supplied literally (i.e. call T(++i, ref, Rval{}) for each), ++i will be called multiple times. If the args is supplied through some temporaries, how can you decide ref will pass by reference, while Rval{} will pass as prvalue?
In short, the syntax seems plausible but doesn't actually make sense and is not generally implementable.

Why are arrays not assignable in C/C++? [duplicate]

I understand that memberwise assignment of arrays is not supported, such that the following will not work:
int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"
I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.
However, the following does work:
struct myStruct { int num[3]; };
struct myStruct struct1 = {{1,2,3}};
struct myStruct struct2;
struct2 = struct1;
The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.
Why is member-wise assignment of arrays supported for structs, but not in general?
edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.
edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.
Here's my take on it:
The Development of the C Language offers some insight in the evolution of the array type in C:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
I'll try to outline the array thing:
C's forerunners B and BCPL had no distinct array type, a declaration like:
auto V[10] (B)
or
let V = vec 10 (BCPL)
would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:
struct {
int inumber;
char name[14];
};
But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.
And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:
auto V[10];
V=V+5;
to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!
/* Example how array assignment void make things even weirder in C/C++,
if we don't want to break existing code.
It's actually better to leave things as they are...
*/
typedef int vec[3];
void f(vec a, vec b)
{
vec x,y;
a=b; // pointer assignment
x=y; // NEW! element-wise assignment
a=x; // pointer assignment
x=a; // NEW! element-wise assignment
}
This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.
Concerning the assignment operators, the C++ standard says the following (C++03 §5.17/1):
There are several assignment operators... all require a modifiable lvalue as their left operand
An array is not a modifiable lvalue.
However, assignment to a class type object is defined specially (§5.17/4):
Assignment to objects of a class is defined by the copy assignment operator.
So, we look to see what the implicitly-declared copy assignment operator for a class does (§12.8/13):
The implicitly-defined copy assignment operator for class X performs memberwise assignment of its subobjects. ... Each subobject is assigned in the manner appropriate to its type:
...
-- if the subobject is an array, each element is assigned, in the manner appropriate to the element type
...
So, for a class type object, arrays are copied correctly. Note that if you provide a user-declared copy assignment operator, you cannot take advantage of this, and you'll have to copy the array element-by-element.
The reasoning is similar in C (C99 §6.5.16/2):
An assignment operator shall have a modifiable lvalue as its left operand.
And §6.3.2.1/1:
A modifiable lvalue is an lvalue that does not have array type... [other constraints follow]
In C, assignment is much simpler than in C++ (§6.5.16.1/2):
In simple assignment (=), the value of the right operand is converted to the type of the
assignment expression and replaces the value stored in the object designated by the left
operand.
For assignment of struct-type objects, the left and right operands must have the same type, so the value of the right operand is simply copied into the left operand.
In this link: http://www2.research.att.com/~bs/bs_faq2.html there's a section on array assignment:
The two fundamental problems with arrays are that
an array doesn't know its own size
the name of an array converts to a pointer to its first element at the slightest provocation
And I think this is the fundamental difference between arrays and structs. An array variable is a low level data element with limited self knowledge. Fundamentally, its a chunk of memory and a way to index into it.
So, the compiler can't tell the difference between int a[10] and int b[20].
Structs, however, do not have the same ambiguity.
I know, everyone who answered are experts in C/C++. But I thought, this is the primary reason.
num2 = num1;
Here you are trying to change the base address of the array, which is not permissible.
and of course,
struct2 = struct1;
Here, object struct1 is assigned to another object.
Another reason no further efforts were made to beef up arrays in C is probably that array assignment would not be that useful. Even though it can be easily achieved in C by wrapping it in a struct (and the struct's address can be simply cast to the array's address or even the array's first element's address for further processing) this feature is rarely used. One reason is that arrays of different sizes are incompatible which limits the benefits of assignment or, related, passing to functions by value.
Most functions with array parameters in languages where arrays are first-class types are written for arrays of arbitrary size. The function then usually iterates over the given number of elements, an information that the array provides. (In C the idiom is, of course, to pass a pointer and a separate element count.) A function which accepts an array of just one specific size is not needed as often, so not much is missed. (This changes when you can leave it to the compiler to generate a separate function for any occurring array size, as with C++ templates; this is the reason why std::array is useful.)

Why can't arrays be passed as function arguments?

Why can't you pass arrays as function arguments?
I have been reading this C++ book that says 'you can't pass arrays as function arguments', but it never explains why. Also, when I looked it up online I found comments like 'why would you do that anyway?' It's not that I would do it, I just want to know why you can't.
Why can't arrays be passed as function arguments?
They can:
void foo(const int (&myArray)[5]) {
// `myArray` is the original array of five integers
}
In technical terms, the type of the argument to foo is "reference to array of 5 const ints"; with references, we can pass the actual object around (disclaimer: terminology varies by abstraction level).
What you can't do is pass by value, because for historical reasons we shall not copy arrays. Instead, attempting to pass an array by value into a function (or, to pass a copy of an array) leads its name to decay into a pointer. (some resources get this wrong!)
Array names decay to pointers for pass-by-value
This means:
void foo(int* ptr);
int ar[10]; // an array
foo(ar); // automatically passing ptr to first element of ar (i.e. &ar[0])
There's also the hugely misleading "syntactic sugar" that looks like you can pass an array of arbitrary length by value:
void foo(int ptr[]);
int ar[10]; // an array
foo(ar);
But, actually, you're still just passing a pointer (to the first element of ar). foo is the same as it was above!
Whilst we're at it, the following function also doesn't really have the signature that it seems to. Look what happens when we try to call this function without defining it:
void foo(int ar[5]);
int main() {
int ar[5];
foo(ar);
}
// error: undefined reference to `func(int*)'
So foo takes int* in fact, not int[5]!
(Live demo.)
But you can work-around it!
You can hack around this by wrapping the array in a struct or class, because the default copy operator will copy the array:
struct Array_by_val
{
int my_array[10];
};
void func (Array_by_val x) {}
int main() {
Array_by_val x;
func(x);
}
This is somewhat confusing behaviour.
Or, better, a generic pass-by-reference approach
In C++, with some template magic, we can make a function both re-usable and able to receive an array:
template <typename T, size_t N>
void foo(const T (&myArray)[N]) {
// `myArray` is the original array of N Ts
}
But we still can't pass one by value. Something to remember.
The future...
And since C++11 is just over the horizon, and C++0x support is coming along nicely in the mainstream toolchains, you can use the lovely std::array inherited from Boost! I'll leave researching that as an exercise to the reader.
So I see answers explaining, "Why doesn't the compiler allow me to do this?" Rather than "What caused the standard to specify this behavior?" The answer lies in the history of C. This is taken from "The Development of the C Language" (source) by Dennis Ritchie.
In the proto-C languages, memory was divided into "cells" each containing a word. These could be dereferenced using the eventual unary * operator -- yes, these were essentially typeless languages like some of today's toy languages like Brainf_ck. Syntactic sugar allowed one to pretend a pointer was an array:
a[5]; // equivalent to *(a + 5)
Then, automatic allocation was added:
auto a[10]; // allocate 10 cells, assign pointer to a
// note that we are still typeless
a += 1; // remember that a is a pointer
At some point, the auto storage specifier behavior became default -- you may also be wondering what the point of the auto keyword was anyway, this is it. Pointers and arrays were left to behave in somewhat quirky ways as a result of these incremental changes. Perhaps the types would behave more alike if the language were designed from a bird's-eye view. As it stands, this is just one more C / C++ gotcha.
Arrays are in a sense second-class types, something that C++ inherited from C.
Quoting 6.3.2.1p3 in the C99 standard:
Except when it is the operand of the sizeof operator or the unary
& operator, or is a string literal used to initialize an array, an
expression that has type "array of type" is converted to an
expression with type "pointer to type" that points to the initial
element of the array object and is not an lvalue. If the array object
has register storage class, the behavior is undefined.
The same paragraph in the C11 standard is essentially the same, with the addition of the new _Alignof operator. (Both links are to drafts which are very close to the official standards. (UPDATE: That was actually an error in the N1570 draft, corrected in the released C11 standard. _Alignof can't be applied to an expression, only to a parenthesized type name, so C11 has only the same 3 exceptions that C99 and C90 did. (But I digress.)))
I don't have the corresponding C++ citation handy, but I believe it's quite similar.
So if arr is an array object, and you call a function func(arr), then func will receive a pointer to the first element of arr.
So far, this is more or less "it works that way because it's defined that way", but there are historical and technical reasons for it.
Permitting array parameters wouldn't allow for much flexibility (without further changes to the language), since, for example, char[5] and char[6] are distinct types. Even passing arrays by reference doesn't help with that (unless there's some C++ feature I'm missing, always a possibility). Passing pointers gives you tremendous flexibility (perhaps too much!). The pointer can point to the first element of an array of any size -- but you have to roll your own mechanism to tell the function how big the array is.
Designing a language so that arrays of different lengths are somewhat compatible while still being distinct is actually quite tricky. In Ada, for example, the equivalents of char[5] and char[6] are the same type, but different subtypes. More dynamic languages make the length part of an array object's value, not of its type. C still pretty much muddles along with explicit pointers and lengths, or pointers and terminators. C++ inherited all that baggage from C. It mostly punted on the whole array thing and introduced vectors, so there wasn't as much need to make arrays first-class types.
TL;DR: This is C++, you should be using vectors anyway! (Well, sometimes.)
Arrays are not passed by value because arrays are essentially continuous blocks of memmory. If you had an array you wanted to pass by value, you could declare it within a structure and then access it through the structure.
This itself has implications on performance because it means you will lock up more space on the stack. Passing a pointer is faster because the envelope of data to be copied onto the stack is far less.
I believe that the reason why C++ did this was, when it was created, that it might have taken up too many resources to send the whole array rather than the address in memory. That is just my thoughts on the matter and an assumption.
It's because of a technical reason. Arguments are passed on the stack; an array can have a huge size, megabytes and more. Copying that data to the stack on every call will not only be slower, but it will exhaust the stack pretty quickly.
You can overcome that limitation by putting an array into a struct (or using Boost::Array):
struct Array
{
int data[512*1024];
int& operator[](int i) { return data[i]; }
};
void foo(Array byValueArray) { .......... }
Try to make nested calls of that function and see how many stack overflows you'll get!

Why do C and C++ support memberwise assignment of arrays within structs, but not generally?

I understand that memberwise assignment of arrays is not supported, such that the following will not work:
int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"
I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.
However, the following does work:
struct myStruct { int num[3]; };
struct myStruct struct1 = {{1,2,3}};
struct myStruct struct2;
struct2 = struct1;
The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.
Why is member-wise assignment of arrays supported for structs, but not in general?
edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.
edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.
Here's my take on it:
The Development of the C Language offers some insight in the evolution of the array type in C:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
I'll try to outline the array thing:
C's forerunners B and BCPL had no distinct array type, a declaration like:
auto V[10] (B)
or
let V = vec 10 (BCPL)
would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:
struct {
int inumber;
char name[14];
};
But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.
And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:
auto V[10];
V=V+5;
to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!
/* Example how array assignment void make things even weirder in C/C++,
if we don't want to break existing code.
It's actually better to leave things as they are...
*/
typedef int vec[3];
void f(vec a, vec b)
{
vec x,y;
a=b; // pointer assignment
x=y; // NEW! element-wise assignment
a=x; // pointer assignment
x=a; // NEW! element-wise assignment
}
This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.
Concerning the assignment operators, the C++ standard says the following (C++03 §5.17/1):
There are several assignment operators... all require a modifiable lvalue as their left operand
An array is not a modifiable lvalue.
However, assignment to a class type object is defined specially (§5.17/4):
Assignment to objects of a class is defined by the copy assignment operator.
So, we look to see what the implicitly-declared copy assignment operator for a class does (§12.8/13):
The implicitly-defined copy assignment operator for class X performs memberwise assignment of its subobjects. ... Each subobject is assigned in the manner appropriate to its type:
...
-- if the subobject is an array, each element is assigned, in the manner appropriate to the element type
...
So, for a class type object, arrays are copied correctly. Note that if you provide a user-declared copy assignment operator, you cannot take advantage of this, and you'll have to copy the array element-by-element.
The reasoning is similar in C (C99 §6.5.16/2):
An assignment operator shall have a modifiable lvalue as its left operand.
And §6.3.2.1/1:
A modifiable lvalue is an lvalue that does not have array type... [other constraints follow]
In C, assignment is much simpler than in C++ (§6.5.16.1/2):
In simple assignment (=), the value of the right operand is converted to the type of the
assignment expression and replaces the value stored in the object designated by the left
operand.
For assignment of struct-type objects, the left and right operands must have the same type, so the value of the right operand is simply copied into the left operand.
In this link: http://www2.research.att.com/~bs/bs_faq2.html there's a section on array assignment:
The two fundamental problems with arrays are that
an array doesn't know its own size
the name of an array converts to a pointer to its first element at the slightest provocation
And I think this is the fundamental difference between arrays and structs. An array variable is a low level data element with limited self knowledge. Fundamentally, its a chunk of memory and a way to index into it.
So, the compiler can't tell the difference between int a[10] and int b[20].
Structs, however, do not have the same ambiguity.
I know, everyone who answered are experts in C/C++. But I thought, this is the primary reason.
num2 = num1;
Here you are trying to change the base address of the array, which is not permissible.
and of course,
struct2 = struct1;
Here, object struct1 is assigned to another object.
Another reason no further efforts were made to beef up arrays in C is probably that array assignment would not be that useful. Even though it can be easily achieved in C by wrapping it in a struct (and the struct's address can be simply cast to the array's address or even the array's first element's address for further processing) this feature is rarely used. One reason is that arrays of different sizes are incompatible which limits the benefits of assignment or, related, passing to functions by value.
Most functions with array parameters in languages where arrays are first-class types are written for arrays of arbitrary size. The function then usually iterates over the given number of elements, an information that the array provides. (In C the idiom is, of course, to pass a pointer and a separate element count.) A function which accepts an array of just one specific size is not needed as often, so not much is missed. (This changes when you can leave it to the compiler to generate a separate function for any occurring array size, as with C++ templates; this is the reason why std::array is useful.)