Why use C++ container "array" rather than traditional C array? - c++

Popular opinion is that C++ array is safer with almost equal efficiency. Except for checking whether the index is out of range and not allowing implicit type cast to pointer, are there any other features?
Besides, why implicit type cast of traditional C array is regarded as a bad way?
I'm not very good at C++(also at English),plz forgive me if this question is not worth an answer. Thank you!

The main benefit is that a std::array is a first-class object in C++, which means you can do all the things with it that you can do with any other "normal" C++ object.
A traditional C array, on the other hand, is not a first-class object, which means a lot of things won't work:
#include <array>
// Works!
std::array<int, 5> ReturnAStdArrayOfSixes()
{
std::array<int, 5> ret;
ret.fill(6);
return ret;
}
// Doesn't compile, sorry
int[5] ReturnACArrayOfSixes()
{
int ret[5];
for (int i=0; i<5; i++) ret[i] = 6;
return ret;
}
int main(int, char **)
{
std::array<int, 5> stdArray1;
std::array<int, 5> stdArray2;
int cArray1[5];
int cArray2[5];
stdArray1 = stdArray2; // works
cArray1 = cArray2; // error: array type 'int [5]' is not assignable
if (stdArray1 < stdArray2) {/* do something */} // compares arrays' contents lexographically
if (cArray1 < cArray2) {/* do something */} // compiles, but compares pointers which probably isn't what you wanted
return 0;
}
As for "implicit type cast of traditional C array" (by which I think you mean the implicit decay of an array-type into a pointer-type), that's a useful mechanism but it can bite you if you aren't expecting it, e.g.:
// This code works as expected
int myArray[5];
cout << "There are " << (sizeof(myArray)/sizeof(int)) << " items in myArray\n";
Now let's refactor the above code into a nice function so we can re-use it:
void PrintArraySize(int myArray[5])
{
std::cout << "There are " << (sizeof(myArray)/sizeof(int)) << " items in myArray\n";
}
int main(int, char **)
{
int myArray[5];
PrintArraySize(myArray);
return 0;
}
... Oh noes! Now PrintArraySize() is buggy, and prints a number much smaller than the number of items in the array! (The reason is that myArray in the function has implicitly decayed to a pointer, so sizeof(myArray) in the function evaluates to the sizeof(int *), e.g. 4 or 8 bytes, rather than the size of the contents of the passed-in array)

Not sure what you exactly meant by your question about the implicit cast of a C array, but I'm guessing you meant either casting implicitly the values stored inside (like in int a[10]; float f=a[0]), or casting the implicit pointer to the first element (like in int a[10]; void* p=a;). Anyway, casting any pointer to void* pointer implicitly is ok (you just need to know what is the size of the memory stored at such pointer if you want to read it), because void* pointers aren't really meant for data manipulation, so whenever you use a void* pointer, you know what you are doing, right?
But a bit more seriously now - implicit casting, in general, is something that the compiler does according to the language rules, without asking the programmer if it is what was actually desired. That's why it sometimes can lead to errors in the code (programmer's oversight, or incorrect assumptions as to what is happening during such a cast).
When you cast explicitly, then, first of all, you make sure that you get the desired result. But what is more important, if your explicit cast gives ambiguous result for the compiler, or even it turns out impossible to the compiler, then the compiler will inform you about that, and you will be able to revise your casting decisions early and prevent bugs in your code.
So explicit casting is just a way to talk to the compiler more strictly, making sure that it understands better what you meant when you wrote your code. Then, in return, the compiler gives you hints when you cast something incorrectly (with implicit casting it would do things its own way, and you wouldn't even know when the effect of such a cast was different than you wanted).

Related

I have a question about passing std::array into function

I'm learning c ++ And the syntax of putting a std::array into the function confuses me.
#include <iostream>
#include <array>
using namespace std;
void printArray(const std::array<int, 5> &n)
{
std::cout << "length: " << n.size() << endl;
for (int j = 0; j < n.size(); j++ )
{
cout << "n[" << j << "] = " << n[j] << endl;
}
}
int main()
{
array<int, 5> n = {1,2,3,4,5};
printArray(n);
return 0;
}
I want to ask about 'const', what role does it play and what effect does it have if not using it?
Why do we have to use &n while the name of an array is already pointer
Depending on the argument you can do certain assumptions about the function.
void printArrayA(std::array<int, 5> n)
If I call printArrayA then the array I pass to it is copied, so the function can't do changes to the array I pass, but has an overhead of copying the array.
void printArrayB(std::array<int, 5> &n)
If I call printArrayB then the array I pass to it is not copied, the function could do changes on the array, or on the elements stored in the array.
void printArrayC(const std::array<int, 5> &n)
If I call printArrayC then the array I pass to it is not copied, and because it is const the function can't do any changes on that array or on its elements. (Well I, in theory, could cast away the const, but that's not something that should be done, a const should indicate the caller, that the object won't be changed)
void printArray(const std::array<int, 5> &n)
What this does is to allow you to pass in to the function an unchangeable (without const-casting it, anyway) reference to the std::array. The reference is a lot smaller than the array itself, typically just a pointer, and the const bit ensures the function does not attempt to change the underlying object.
It's usually used to ensure you don't copy "large" things where unnecessary, things like vectors, arrays or strings.
In this case const means that printArray will not modify the object passed to it.
An std::array is not a C-style array (such ase int a[10]), so you are not passing a pointer, you are passing a reference to an std::array object.
Your question is not only about 'std::vector', it is about 'const' and 'references'.
'const' keywords means that you can call only 'const' method of that class, means that (assume that the used class is implemented correctly) you can't modify this class in that method.
'&' means that you pass the parameter by 'reference' and not by 'value', if you are not familiar with that difference you may want to read this: Value vs. Reference
As others have mentioned, the name of a std::array does not decay to a pointer. To convert to a pointer, you would call .data().
Why do you want to pass by (const) reference, and why does a foo[]decay to a foo* const in many contexts?
You (practically) never want to copy arrays. It takes a huge amount of time and memory. You always want to pass them to a function by reference. On top of that, most algorithms that work on arrays are supposed to work on arrays of any size. (Brian Kernighan, the K in K&R, particularly considered it a flaw in Pascal that the size of an array is part of its type.) Therefore, back in the ’70s, the designers of C made a rule that passing an array to a function is the same as passing a pointer to its first element.
C++ stuck with that for the sake of backward compatibility: a lot of programmers compile C code in C++ compilers. One famous pitfall is trying to take the sizeof an array parameter in C++. Because array parameters decay to pointers, this gives you the size of a pointer. In C++, you have the option of passing a type (&name)[size] instead; that is, passing a type[size] by reference. This preserves the size as part of the type. (C also got some new syntax in 1999 that never made it over to C++, including the ability to pass the size of an array parameter.)
In C++, a std::array has no special syntax and works like any other type. You never want to copy an entire array that you aren’t going to modify. You still want to pass arrays by reference, so you use the standard syntax for that: &. Whenever you don’t need to modify a parameter, you declare it const. This helps the compiler detect logic errors and, sometimes, optimize.

What real use does a double pointer have?

I have searched and searched for an answer to this but can't find anything I actually "get".
I am very very new to c++ and can't get my head around the use of double, triple pointers etc.. What is the point of them?
Can anyone enlighten me
Honestly, in well-written C++ you should very rarely see a T** outside of library code. In fact, the more stars you have, the closer you are to winning an award of a certain nature.
That's not to say that a pointer-to-pointer is never called for; you may need to construct a pointer to a pointer for the same reason that you ever need to construct a pointer to any other type of object.
In particular, I might expect to see such a thing inside a data structure or algorithm implementation, when you're shuffling around dynamically allocated nodes, perhaps?
Generally, though, outside of this context, if you need to pass around a reference to a pointer, you'd do just that (i.e. T*&) rather than doubling up on pointers, and even that ought to be fairly rare.
On Stack Overflow you're going to see people doing ghastly things with pointers to arrays of dynamically allocated pointers to data, trying to implement the least efficient "2D vector" they can think of. Please don't be inspired by them.
In summary, your intuition is not without merit.
An important reason why you should/must know about pointer-to-pointer-... is that you sometimes have to interface with other languages (like C for instance) through some API (for instance the Windows API).
Those APIs often have functions that have an output-parameter that returns a pointer. However those other languages often don't have references or compatible (with C++) references. That's a situation when pointer-to-pointer is needed.
It's less used in c++. However, in C, it can be very useful. Say that you have a function that will malloc some random amount of memory and fill the memory with some stuff. It would be a pain to have to call a function to get the size you need to allocate and then call another function that will fill the memory. Instead you can use a double pointer. The double pointer allows the function to set the pointer to the memory location. There are some other things it can be used for but that's the best thing I can think of.
int func(char** mem){
*mem = malloc(50);
return 50;
}
int main(){
char* mem = NULL;
int size = func(&mem);
free(mem);
}
I am very very new to c++ and can't get my head around the use of double, triple pointers etc.. What is the point of them?
The trick to understanding pointers in C is simply to go back to the basics, which you were probably never taught. They are:
Variables store values of a particular type.
Pointers are a kind of value.
If x is a variable of type T then &x is a value of type T*.
If x evaluates to a value of type T* then *x is a variable of type T. More specifically...
... if x evaluates to a value of type T* that is equal to &a for some variable a of type T, then *x is an alias for a.
Now everything follows:
int x = 123;
x is a variable of type int. Its value is 123.
int* y = &x;
y is a variable of type int*. x is a variable of type int. So &x is a value of type int*. Therefore we can store &x in y.
*y = 456;
y evaluates to the contents of variable y. That's a value of type int*. Applying * to a value of type int* gives a variable of type int. Therefore we can assign 456 to it. What is *y? It is an alias for x. Therefore we have just assigned 456 to x.
int** z = &y;
What is z? It's a variable of type int**. What is &y? Since y is a variable of type int*, &y must be a value of type int**. Therefore we can assign it to z.
**z = 789;
What is **z? Work from the inside out. z evaluates to an int**. Therefore *z is a variable of type int*. It is an alias for y. Therefore this is the same as *y, and we already know what that is; it's an alias for x.
No really, what's the point?
Here, I have a piece of paper. It says 1600 Pennsylvania Avenue Washington DC. Is that a house? No, it's a piece of paper with the address of a house written on it. But we can use that piece of paper to find the house.
Here, I have ten million pieces of paper, all numbered. Paper number 123456 says 1600 Pennsylvania Avenue. Is 123456 a house? No. Is it a piece of paper? No. But it is still enough information for me to find the house.
That's the point: often we need to refer to entities through multiple levels of indirection for convenience.
That said, double pointers are confusing and a sign that your algorithm is insufficiently abstract. Try to avoid them by using good design techniques.
A double-pointer, is simply a pointer to a pointer. A common usage is for arrays of character strings. Imagine the first function in just about every C/C++ program:
int main(int argc, char *argv[])
{
...
}
Which can also be written
int main(int argc, char **argv)
{
...
}
The variable argv is a pointer to an array of pointers to char. This is a standard way of passing around arrays of C "strings". Why do that? I've seen it used for multi-language support, blocks of error strings, etc.
Don't forget that a pointer is just a number - the index of the memory "slot" inside a computer. That's it, nothing more. So a double-pointer is index of a piece of memory that just happens to hold another index to somewhere else. A mathematical join-the-dots if you like.
This is how I explained pointers to my kids:
Imagine the computer memory is a series of boxes. Each box has a number written on it, starting at zero, going up by 1, to however many bytes of memory there is. Say you have a pointer to some place in memory. This pointer is just the box number. My pointer is, say 4. I look into box #4. Inside is another number, this time it's 6. So now we look into box #6, and get the final thing we wanted. My original pointer (that said "4") was a double-pointer, because the content of its box was the index of another box, rather than being a final result.
It seems in recent times pointers themselves have become a pariah of programming. Back in the not-too-distant past, it was completely normal to pass around pointers to pointers. But with the proliferation of Java, and increasing use of pass-by-reference in C++, the fundamental understanding of pointers declined - particularly around when Java became established as a first-year computer science beginners language, over say Pascal and C.
I think a lot of the venom about pointers is because people just don't ever understand them properly. Things people don't understand get derided. So they became "too hard" and "too dangerous". I guess with even supposedly learned people advocating Smart Pointers, etc. these ideas are to be expected. But in reality there a very powerful programming tool. Honestly, pointers are the magic of programming, and after-all, they're just a number.
In many situations, a Foo*& is a replacement for a Foo**. In both cases, you have a pointer whose address can be modified.
Suppose you have an abstract non-value type and you need to return it, but the return value is taken up by the error code:
error_code get_foo( Foo** ppfoo )
or
error_code get_foo( Foo*& pfoo_out )
Now a function argument being mutable is rarely useful, so the ability to change where the outermost pointer ppFoo points at is rarely useful. However, a pointer is nullable -- so if get_foo's argument is optional, a pointer acts like an optional reference.
In this case, the return value is a raw pointer. If it returns an owned resource, it should usually be instead a std::unique_ptr<Foo>* -- a smart pointer at that level of indirection.
If instead, it is returning a pointer to something it does not share ownership of, then a raw pointer makes more sense.
There are other uses for Foo** besides these "crude out parameters". If you have a polymorphic non-value type, non-owning handles are Foo*, and the same reason why you'd want to have an int* you would want to have a Foo**.
Which then leads you to ask "why do you want an int*?" In modern C++ int* is a non-owning nullable mutable reference to an int. It behaves better when stored in a struct than a reference does (references in structs generate confusing semantics around assignment and copy, especially if mixed with non-references).
You could sometimes replace int* with std::reference_wrapper<int>, well std::optional<std::reference_wrapper<int>>, but note that is going to be 2x as large as a simple int*.
So there are legitimate reasons to use int*. Once you have that, you can legitimately use Foo** when you want a pointer to a non-value type. You can even get to int** by having a contiguous array of int*s you want to operate on.
Legitimately getting to three-star programmer gets harder. Now you need a legitimate reason to (say) want to pass a Foo** by indirection. Usually long before you reach that point, you should have considered abstracting and/or simplifying your code structure.
All of this ignores the most common reason; interacting with C APIs. C doesn't have unique_ptr, it doesn't have span. It tends to use primitive types instead of structs because structs require awkward function based access (no operator overloading).
So when C++ interacts with C, you sometimes get 0-3 more *s than the equivalent C++ code would.
The use is to have a pointer to a pointer, e.g., if you want to pass a pointer to a method by reference.
What real use does a double pointer have?
Here is practical example. Say you have a function and you want to send an array of string params to it (maybe you have a DLL you want to pass params to). This can look like this:
#include <iostream>
void printParams(const char **params, int size)
{
for (int i = 0; i < size; ++i)
{
std::cout << params[i] << std::endl;
}
}
int main()
{
const char *params[] = { "param1", "param2", "param3" };
printParams(params, 3);
return 0;
}
You will be sending an array of const char pointers, each pointer pointing to the start of a null terminated C string. The compiler will decay your array into pointer at function argument, hence what you get is const char ** a pointer to first pointer of array of const char pointers. Since the array size is lost at this point, you will want to pass it as second argument.
One case where I've used it is a function manipulating a linked list, in C.
There is
struct node { struct node *next; ... };
for the list nodes, and
struct node *first;
to point to the first element. All the manipulation functions take a struct node **, because I can guarantee that this pointer is non-NULL even if the list is empty, and I don't need any special cases for insertion and deletion:
void link(struct node *new_node, struct node **list)
{
new_node->next = *list;
*list = new_node;
}
void unlink(struct node **prev_ptr)
{
*prev_ptr = (*prev_ptr)->next;
}
To insert at the beginning of the list, just pass a pointer to the first pointer, and it will do the right thing even if the value of first is NULL.
struct node *new_node = (struct node *)malloc(sizeof *new_node);
link(new_node, &first);
Multiple indirection is largely a holdover from C (which has neither reference nor container types). You shouldn't see multiple indirection that much in well-written C++, unless you're dealing with a legacy C library or something like that.
Having said that, multiple indirection falls out of some fairly common use cases.
In both C and C++, array expressions will "decay" from type "N-element array of T" to "pointer to T" under most circumstances1. So, assume an array definition like
T *a[N]; // for any type T
When you pass a to a function, like so:
foo( a );
the expression a will be converted from "N-element array of T *" to "pointer to T *", or T **, so what the function actually receives is
void foo( T **a ) { ... }
A second place they pop up is when you want a function to modify a parameter of pointer type, something like
void foo( T **ptr )
{
*ptr = new_value();
}
void bar( void )
{
T *val;
foo( &val );
}
Since C++ introduced references, you probably won't see that as often. You'll usually only see that when working with a C-based API.
You can also use multiple indirection to set up "jagged" arrays, but you can achieve the same thing with C++ containers for much less pain. But if you're feeling masochistic:
T **arr;
try
{
arr = new T *[rows];
for ( size_t i = 0; i < rows; i++ )
arr[i] = new T [size_for_row(i)];
}
catch ( std::bad_alloc& e )
{
...
}
But most of the time in C++, the only time you should see multiple indirection is when an array of pointers "decays" to a pointer expression itself.
The exceptions to this rule occur when the expression is the operand of the sizeof or unary & operator, or is a string literal used to initialize another array in a declaration.
In C++, if you want to pass a pointer as an out or in/out parameter, you pass it by reference:
int x;
void f(int *&p) { p = &x; }
But, a reference can't ("legally") be nullptr, so, if the pointer is optional, you need a pointer to a pointer:
void g(int **p) { if (p) *p = &x; }
Sure, since C++17 you have std::optional, but, the "double pointer" has been idiomatic C/C++ code for many decades, so should be OK. Also, the usage is not so nice, you either:
void h(std::optional<int*> &p) { if (p) *p = &x) }
which is kind of ugly at the call site, unless you already have a std::optional, or:
void u(std::optional<std::reference_wrapper<int*>> p) { if (p) p->get() = &x; }
which is not so nice in itself.
Also, some might argue that g is nicer to read at the call site:
f(p);
g(&p); // `&` indicates that `p` might change, to some folks

Can we safely call C API functions from C++ when arrays are involved?

Context:
As an old C programmer (even K&R C...) I had always believed that an array was nothing more than contiguously allocated nonempty set of objects with a
particular member object type, called the element type (from n1570 draft for C11 standard, 6.2.5 Types). For that reason I did not worry too much about pointer arithmetics.
I now know that an array is an object type and that it can be only created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a
union (12.3), or when a temporary object is created (7.4, 15.2) (from n4659 draft for C++17).
Problem:
I have to use a C library in which some functions return pointers to arrays of C structs. So far so good, a C struct is a POD type, and proper padding and alignment is achieved by using the standard flags of the compiler. But as the size of the array is only known at runtime, even with the correct extern "C" declarations, my function is declared to return a pointer to the first element of the array - the actual size is returned by a different function of the API.
Simplified example:
#include <iostream>
extern "C" {
struct Elt {
int ival;
//...
};
void *libinit(); // initialize the library and get a handle
size_t getNElts(void *id); // get the number of elements
struct Elt* getElts(void *id); // get an access the the array of elements
void libend(void *id); // releases library internal data
}
int main() {
void *libid = libinit();
Elt* elts = getElts(libid);
size_t nelts = getNElts(libid);
for(int i=0; i<nelts; i++) {
std::cout << elts[i].ival << " "; // is elts[i] legal?
}
std::cout << std::endl;
libend(libid);
return 0;
}
Question:
I know that the bloc of memory has probably been allocated through malloc, which could allow to use pointers on it and I assume that getElts(libid)[0] does not involve Undefined Behaviour. But is it legal to use pointer arithmetics over the C array, when it has never been declared as a C++ array: the API only guarantees that I have a contiguously allocated set of objects of type Elt and that getElts returns a pointer to the first element of that set.
Because [expr.add] explicitely restrict pointer arithmetics inside an array:
4 When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,
the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[i + j] if 0 <= i + j <=n; otherwise, the behavior is undefined...
That used to be a common pratice...
EDIT
In order to make more clear my question, I know that this would be UB if done in C++
libstub.c++
/* C++ simulation of a C implementation */
extern "C" {
struct Elt {
int ival;
//...
};
void *libinit(); // initialize the library and get a handle
size_t getNElts(void *id); // get the number of elements
struct Elt* getElts(void *id); // get an access the the array of elements
void libend(void *id); // releases library internal data
}
size_t getCurrentSize() {
return 1024; // let us assume that the returned value is not a constexpr
}
void *libinit() {
size_t N = getCurrentSize();
unsigned char * storage = new unsigned char[(N + 1) * sizeof(Elt)];
// storage can provide storage for a size_t correct alignment
size_t *n = new(storage) size_t;
*n = N;
for (size_t i=1; i<=N; i++) {
// storage can provide storage for a size_t, correct alignment
Elt *elt = new(storage + (i+1) * sizeof(Elt)) Elt();
elt->ival = i; // put values into elt...
}
return static_cast<void *>(storage);
}
void libend(void * id) {
unsigned char *storage = static_cast<unsigned char *>(id); // ok, back cast is valid
delete[] storage; // ok, was allocated by new[]
}
size_t getNElts(void *id) {
size_t *n = reinterpret_cast<size_t *>(id); // ok a size_t was created there
return *n;
}
Elt *getElts(void *id) {
unsigned char *storage = static_cast<unsigned char *>(id); // ok, back cast
Elt* elt = reinterpret_cast<Elt *>(storage + sizeof(Elt)); // ok an Elt was created there
return elt;
}
This is valid C++ code, and it fullfills the C API requirement. The problem is that getElts returns a pointer to a single element object which is not member of any array. So according to [expr.add] pointer arithmetics based on the return value of getElts invokes UB
The c++ standard provides nearly zero interoperability guarantees woth c.
As far as C++ is concerned, what happens within C code is outside the scope of C++.
So "does this pointer poimt to an array" is a question the C++ standard cannot answer, as the pointer comes from a C function. Rather it is a question left to your particular compiler.
In practice, this works. In theory, there are no guarantees provided by C++ that your program is well formed when you interact in any way with C.
This is good news, because the C++ standard is broken around creating dynamic arrays of type T. It is so bad that there is no standard-compliant way to implement std::vector without compiler magic, or by the compiler defining the undefined behavior that results from attempting to do it.
However, C++ compilers are free to completely ignore this problem. They are free to define inter-element pointer arithmetic when objects are contiguously allocated to behave just like an array. I am unaware of a compiler that states or guarantees this formally.
Similarly, they are free to produce any guarantees whatesoever with how they treat pointers from C code. And in practice, they do provide quite reasonable behavior when you interact with C code.
I am unaware of any formal guaratees by any compiler.
Pointer arithmetic using the builtin [] operator on pointers is strictly equivalent to doing the pointer arithmetic by hand in the sense that the following is guaranteed:
int arr[2] = { 0, 1 };
assert(arr[1] == *(arr + 1));
The two variants are guaranteed to have the same semantics. As far as your example is concerned, if you know for sure that your API returns a pointer to some contiguous memory, then your code is perfectly valid. This is assumption seems perfectly fine given the way the API seems to work. As a side note, I have never seen an allocator that did not allocate contiguous memory on a modern system, it just seems like a very silly thing to do to me and it does not seem to me like something that is doable given the way C and C++ work (at least not with language support w.r.t to field accesses), anyone correct me if I am wrong though.
getElts returns an address to the beginning of what is an array of something, created in the C library.
getNElts returns the number of elements in that array.
Presumably, you know the exact size of Elt.
Thus, you have all of the information necessary to access your data in C++, using pointer arithmetic if you so choose. It may be technically "undefined", but practically it is not undefined, and it works. This has to be commonly done, especially when dealing with interfaces to hardware.
If you are uncomfortable with going out of bounds on the array that you say is not a C++ array, create an array in C++ and place it at the location returned by getElts. You could even create a std::vector in C++, and memcpy the data pointed to by getElts on to the vector.
something like this:
struct Elt{
int j;
// etc.
}
std::vector<Elt> elts; // create a vector of Elt
size_t n_elts = getNElts(); // call library to get number of Elts
elts.resize(n_elts); // resize the vector according to number of elements
Elt* addr = getElts(); // get the address of the elements array from the library
std::memcpy(&elts[0], addr, n_elts * sizeof(Elt)); // copy the array over the vector data, which starts at &elts[0].
// there may be better ways to do this copy but this works very well.
// now you can access the elements from the vector.
// using .at for bounds check.
Elt my_elt = elts.at(1);
// not bound checked...
Elt my_elt_2 = elts[2];
You are now working on a copy of the elements contained in a C++ std::vector. If the elements are dynamic from the library, you can 'place' the vector contents at the address returned by the library, and not do the copy. Then you are 'looking' at the memory allocated in the C side.
I'm not sure that all of this is 'defined' behavior, but it will work (I'm not an expert on the standard). You may have other issues with assuring that the Elt structure really lays out the same in your C and C++ implementations, but that can all be worked out.
The bottom line is, there are many ways to do what it appears you are wanting to do. I think you are getting hung up on semantics of pointer arithmetic. Pointer arithmetic is always dangerous, and can lead to undefined behavior, because it is easy to go out of bounds on an array. This is why bare arrays are not recommended practice in C++. There are usually safer ways to do things than using bare arrays.

Why do C and C++ compilers allow array lengths in function signatures when they're never enforced?

This is what I found during my learning period:
#include<iostream>
using namespace std;
int dis(char a[1])
{
int length = strlen(a);
char c = a[2];
return length;
}
int main()
{
char b[4] = "abc";
int c = dis(b);
cout << c;
return 0;
}
So in the variable int dis(char a[1]) , the [1] seems to do nothing and doesn't work at
all, because I can use a[2]. Just like int a[] or char *a. I know the array name is a pointer and how to convey an array, so my puzzle is not about this part.
What I want to know is why compilers allow this behavior (int a[1]). Or does it have other meanings that I don't know about?
It is a quirk of the syntax for passing arrays to functions.
Actually it is not possible to pass an array in C. If you write syntax that looks like it should pass the array, what actually happens is that a pointer to the first element of the array is passed instead.
Since the pointer does not include any length information, the contents of your [] in the function formal parameter list are actually ignored.
The decision to allow this syntax was made in the 1970s and has caused much confusion ever since...
The length of the first dimension is ignored, but the length of additional dimensions are necessary to allow the compiler to compute offsets correctly. In the following example, the foo function is passed a pointer to a two-dimensional array.
#include <stdio.h>
void foo(int args[10][20])
{
printf("%zd\n", sizeof(args[0]));
}
int main(int argc, char **argv)
{
int a[2][20];
foo(a);
return 0;
}
The size of the first dimension [10] is ignored; the compiler will not prevent you from indexing off the end (notice that the formal wants 10 elements, but the actual provides only 2). However, the size of the second dimension [20] is used to determine the stride of each row, and here, the formal must match the actual. Again, the compiler will not prevent you from indexing off the end of the second dimension either.
The byte offset from the base of the array to an element args[row][col] is determined by:
sizeof(int)*(col + 20*row)
Note that if col >= 20, then you will actually index into a subsequent row (or off the end of the entire array).
sizeof(args[0]), returns 80 on my machine where sizeof(int) == 4. However, if I attempt to take sizeof(args), I get the following compiler warning:
foo.c:5:27: warning: sizeof on array function parameter will return size of 'int (*)[20]' instead of 'int [10][20]' [-Wsizeof-array-argument]
printf("%zd\n", sizeof(args));
^
foo.c:3:14: note: declared here
void foo(int args[10][20])
^
1 warning generated.
Here, the compiler is warning that it is only going to give the size of the pointer into which the array has decayed instead of the size of the array itself.
The problem and how to overcome it in C++
The problem has been explained extensively by pat and Matt. The compiler is basically ignoring the first dimension of the array's size effectively ignoring the size of the passed argument.
In C++, on the other hand, you can easily overcome this limitation in two ways:
using references
using std::array (since C++11)
References
If your function is only trying to read or modify an existing array (not copying it) you can easily use references.
For example, let's assume you want to have a function that resets an array of ten ints setting every element to 0. You can easily do that by using the following function signature:
void reset(int (&array)[10]) { ... }
Not only this will work just fine, but it will also enforce the dimension of the array.
You can also make use of templates to make the above code generic:
template<class Type, std::size_t N>
void reset(Type (&array)[N]) { ... }
And finally you can take advantage of const correctness. Let's consider a function that prints an array of 10 elements:
void show(const int (&array)[10]) { ... }
By applying the const qualifier we are preventing possible modifications.
The standard library class for arrays
If you consider the above syntax both ugly and unnecessary, as I do, we can throw it in the can and use std::array instead (since C++11).
Here's the refactored code:
void reset(std::array<int, 10>& array) { ... }
void show(std::array<int, 10> const& array) { ... }
Isn't it wonderful? Not to mention that the generic code trick I've taught you earlier, still works:
template<class Type, std::size_t N>
void reset(std::array<Type, N>& array) { ... }
template<class Type, std::size_t N>
void show(const std::array<Type, N>& array) { ... }
Not only that, but you get copy and move semantic for free. :)
void copy(std::array<Type, N> array) {
// a copy of the original passed array
// is made and can be dealt with indipendently
// from the original
}
So, what are you waiting for? Go use std::array.
It's a fun feature of C that allows you to effectively shoot yourself in the foot if you're so inclined. I think the reason is that C is just a step above assembly language. Size checking and similar safety features have been removed to allow for peak performance, which isn't a bad thing if the programmer is being very diligent. Also, assigning a size to the function argument has the advantage that when the function is used by another programmer, there's a chance they'll notice a size restriction. Just using a pointer doesn't convey that information to the next programmer.
First, C never checks array bounds. Doesn't matter if they are local, global, static, parameters, whatever. Checking array bounds means more processing, and C is supposed to be very efficient, so array bounds checking is done by the programmer when needed.
Second, there is a trick that makes it possible to pass-by-value an array to a function. It is also possible to return-by-value an array from a function. You just need to create a new data type using struct. For example:
typedef struct {
int a[10];
} myarray_t;
myarray_t my_function(myarray_t foo) {
myarray_t bar;
...
return bar;
}
You have to access the elements like this: foo.a[1]. The extra ".a" might look weird, but this trick adds great functionality to the C language.
To tell the compiler that myArray points to an array of at least 10 ints:
void bar(int myArray[static 10])
A good compiler should give you a warning if you access myArray [10]. Without the "static" keyword, the 10 would mean nothing at all.
This is a well-known "feature" of C, passed over to C++ because C++ is supposed to correctly compile C code.
Problem arises from several aspects:
An array name is supposed to be completely equivalent to a pointer.
C is supposed to be fast, originally developerd to be a kind of "high-level Assembler" (especially designed to write the first "portable Operating System": Unix), so it is not supposed to insert "hidden" code; runtime range checking is thus "forbidden".
Machine code generrated to access a static array or a dynamic one (either in the stack or allocated) is actually different.
Since the called function cannot know the "kind" of array passed as argument everything is supposed to be a pointer and treated as such.
You could say arrays are not really supported in C (this is not really true, as I was saying before, but it is a good approximation); an array is really treated as a pointer to a block of data and accessed using pointer arithmetic.
Since C does NOT have any form of RTTI You have to declare the size of the array element in the function prototype (to support pointer arithmetic). This is even "more true" for multidimensional arrays.
Anyway all above is not really true anymore :p
Most modern C/C++ compilers do support bounds checking, but standards require it to be off by default (for backward compatibility). Reasonably recent versions of gcc, for example, do compile-time range checking with "-O3 -Wall -Wextra" and full run-time bounds checking with "-fbounds-checking".
C will not only transform a parameter of type int[5] into *int; given the declaration typedef int intArray5[5];, it will transform a parameter of type intArray5 to *int as well. There are some situations where this behavior, although odd, is useful (especially with things like the va_list defined in stdargs.h, which some implementations define as an array). It would be illogical to allow as a parameter a type defined as int[5] (ignoring the dimension) but not allow int[5] to be specified directly.
I find C's handling of parameters of array type to be absurd, but it's a consequence of efforts to take an ad-hoc language, large parts of which weren't particularly well-defined or thought-out, and try to come up with behavioral specifications that are consistent with what existing implementations did for existing programs. Many of the quirks of C make sense when viewed in that light, particularly if one considers that when many of them were invented, large parts of the language we know today didn't exist yet. From what I understand, in the predecessor to C, called BCPL, compilers didn't really keep track of variable types very well. A declaration int arr[5]; was equivalent to int anonymousAllocation[5],*arr = anonymousAllocation;; once the allocation was set aside. the compiler neither knew nor cared whether arr was a pointer or an array. When accessed as either arr[x] or *arr, it would be regarded as a pointer regardless of how it was declared.
One thing that hasn't been answered yet is the actual question.
The answers already given explain that arrays cannot be passed by value to a function in either C or C++. They also explain that a parameter declared as int[] is treated as if it had type int *, and that a variable of type int[] can be passed to such a function.
But they don't explain why it has never been made an error to explicitly provide an array length.
void f(int *); // makes perfect sense
void f(int []); // sort of makes sense
void f(int [10]); // makes no sense
Why isn't the last of these an error?
A reason for that is that it causes problems with typedefs.
typedef int myarray[10];
void f(myarray array);
If it were an error to specify the array length in function parameters, you would not be able to use the myarray name in the function parameter. And since some implementations use array types for standard library types such as va_list, and all implementations are required to make jmp_buf an array type, it would be very problematic if there were no standard way of declaring function parameters using those names: without that ability, there could not be a portable implementation of functions such as vprintf.
It's allowed for compilers to be able to check whether the size of array passed is the same as what expected. Compilers may warn an issue if it's not the case.

How do you declare a pointer to a function that returns a pointer to an array of int values in C / C++?

Is this correct?
int (*(*ptr)())[];
I know this is trivial, but I was looking at an old test about these kind of constructs, and this particular combination wasn't on the test and it's really driving me crazy; I just have to make sure. Is there a clear and solid understandable rule to these kind of declarations?
(ie: pointer to... array of.. pointers to... functions that.... etc etc)
Thanks!
R
The right-left rule makes it easy.
int (*(*ptr)())[];can be interpreted as
Start from the variable name ------------------------------- ptr
Nothing to right but ) so go left to find * -------------- is a pointer
Jump out of parentheses and encounter () ----------- to a function that takes no arguments(in case of C unspecified number of arguments)
Go left, find * ------------------------------------------------ and returns a pointer
Jump put of parentheses, go right and hit [] ---------- to an array of
Go left again, find int ------------------------------------- ints.
In almost all situations where you want to return a pointer to an array the simplest thing to do is to return a pointer to the first element of the array. This pointer can be used in the same contexts as an array name an provides no more or less indirection than returning a pointer of type "pointer to array", indeed it will hold the same pointer value.
If you follow this you want a pointer to a function returning a pointer to an int. You can build this up (construction of declarations is easier than parsing).
Pointer to int:
int *A;
Function returning pointer to int:
int *fn();
pointer to function returning a pointer to int:
int *(*pfn)();
If you really want to return a pointer to a function returning a pointer to an array of int you can follow the same process.
Array of int:
int A[];
Pointer to array of int:
int (*p)[];
Function returning pointer ... :
int (*fn())[];
Pointer to fn ... :
int (*(*pfn)())[];
which is what you have.
You don't. Just split it up into two typedefs: one for pointer to int array, and one for pointer to functions. Something like:
typedef int (*IntArrayPtr_t)[];
typedef IntArrayPtr_t (*GetIntArrayFuncPtr_t)(void);
This is not only more readable, it also makes it easier to declare/define the functions that you are going to assign the variable:
IntArrayPtr_t GetColumnSums(void)
{ .... }
Of course this assumes this was a real-world situation, and not an interview question or homework. I would still argue this is a better solution for those cases, but that's only me. :)
If you feel like cheating:
typedef int(*PtrToArray)[5];
PtrToArray function();
int i = function;
Compiling that on gcc yields: invalid conversion from 'int (*(*)())[5]' to 'int'. The first bit is the type you're looking for.
Of course, once you have your PtrToArray typedef, the whole exercise becomes rather more trivial, but sometimes this comes in handy if you already have the function name and you just need to stick it somewhere. And, for whatever reason, you can't rely on template trickery to hide the gory details from you.
If your compiler supports it, you can also do this:
typedef int(*PtrToArray)[5];
PtrToArray function();
template<typename T> void print(T) {
cout << __PRETTY_FUNCTION__ << endl;
}
print(function);
Which, on my computer box, produces void function(T) [with T = int (* (*)())[5]]
Being able to read the types is pretty useful, since understanding compiler errors is often dependent on your ability to figure out what all those parenthesis mean. But making them yourself is less useful, IMO.
Here's my solution...
int** (*func)();
Functor returning an array of int*'s. It isn't as complicated as your solution.
Using cdecl you get the following
cdecl> declare a as pointer to function returning pointer to array of int;
Warning: Unsupported in C -- 'Pointer to array of unspecified dimension'
(maybe you mean "pointer to object")
int (*(*a)())[]
This question from C-faq is similar but provides 3 approaches to solve the problem.