Related
From Programming Language Pragmatics, by Scott
For systems programming, or to facilitate the writing of
general-purpose con- tainer (collection) objects (lists, stacks,
queues, sets, etc.) that hold references to other objects, several
languages provide a universal reference type. In C and C++, this
type is called void *. In Clu it is called any; in Modula-2,
address; in Modula-3, refany; in Java, Object; in C#, object.
In C and C++, how does void * work as a universal reference type?
void * is always only a pointer type, while a universal reference type contains all values, both pointers and nonpointers. So I can't see how void * is a universal reference type.
Thanks.
A void* pointer will generally hold any pointer that is not a C++ pointer-to-member. It's rather inconvenient in practice, since you need to cast it to another pointer type before you can use it. You also need to convert it to the same pointer type that it was converted from to make the void*, otherwise you risk undefined behavior.
A good example would be the qsort function. It takes a void* pointer as a parameter, meaning it can point to an array of anything. The comparison function you pass to qsort must know how to cast two void* pointers back to the types of the array elements in order to compare them.
The crux of your confusion is that neither an instance of void * nor an instance of Modula-3's refany, nor an instance of any other language's "can refer to anything" type, contains the object that it refers to. A variable of type void * is always a pointer and a variable of type refany is always a reference. But the object that they refer to can be of any type.
A purist of programming-language theory would tell you that C does not have references at all, because pointers are not references. It has a nearly-universal pointer type, void *, which can point to an object of any type (including integers, aggregates, and other pointers). As a common but not ubiquitous extension, it can also point to any function (functions are not objects).
The purist would also tell you that C++ does not have a (nearly-)universal pointer type, because of its stricter type system, and doesn't have a universal reference type either.
They would also say that the book you are reading is being sloppy with its terminology, and they would caution you to not take any one such book for the gospel truth on terminological matters, or any other matters. You should instead read widely in both books and CS journals and conference proceedings (collectively known as "the literature") until you gain an "ear" for what is generally-agreed-on terminology, what is specific to a subdiscipline or a community of practice, and so on.
And finally they would remind you that C and C++ are two different languages, and anyone who speaks of them in the same breath is either glossing over the distinctions (which may or may not be relevant in context), decades out of date, or both.
Probably the reason is that you can take address of any variable of any type and cast it to void*.
It does by a silent contract that you know the actual type of object.
So you can store different kinds of elements in a container, but you need to somehow know what is what when taking elements back, to interpret them correctly.
The only convenience void* offers is that it's idiomatic for this, i.e. it's clear that dereferencing the pointer makes no sense, and void* is implicitly convertible to any pointer type. That is for c/
In c++ this is called type erasure techniques preferred. Or special types, like any (there is a boost version of this too.)
void* is no more just a pointer. Thus, it holds an address of an object (or an array and stuffs like that)
When your program is running, every variable should have it owns address in memory, right? And pointer is somethings point to that address.
In normal, each type of pointer should be the same type of object int b = 5; int* p = &b; for example. But that is the case you know what the type is, it means the specific type.
But sometimes, you just want to know that it stores somethings somewhere in memory and you know what "type" of that address, you can cast easily. For example, in OpenCV library which I am learning, there are a lot of functions where user can pass the arguments to instead of declaring global variables and most use in callback functions, like this:
void onChange(int v, void *ptr)
Here, the library does not care about what ptr point to, it just know that when you call the function, if you pass an address to like this onChange(5,&b) then you must cast ptr to the same type before dealing with it int b = static_cast<int*>(ptr);
Probably this explanation from Understanding pointers from Richard Reese will help
A pointer to void is a general-purpose pointer used to hold references to any data type.
It has two interesting properties:
A pointer to void will have the same representation and memory alignment as a pointer to char
A pointer to void will never be equal to another pointer. However, two void pointers assigned a NULL value will be equal.
Any pointer can be assigned to a pointer to void. It can then be cast back to its original pointer type. When this happens the value will be equal to the original pointer value.
This is illustrated in the following sequence, where a pointer to
int is assigned to a pointer to void and then back to a pointer to int
#include<stdio.h>
void main()
{
int num = 100;
int *pi = #
printf("value of pi is %p\n", pi);
void* pv = pi;
pi = (int*)pv;
printf("value of pi is %p\n", pi);
}
Pointers to void are used for data pointers, not function pointers
In school, our lecturer taught us that the entire array was passed by reference when we pass it to a function,.
However, recently I read a book. It says that arrays are passed by pointer by default when passing the entire array to a function. The book further mention that "passing by pointer is very similar to passing by reference", which means that passing by pointer and passing by reference are actually different.
It appears that different source stated differently.
So my question is: In C++, are arrays passed by reference or by pointer when we pass the entire array to a function?
For Example:
void funcA(int []); //Function Declaration
int main()
{
int array[5];
funcA(array); //Is array passed by ref or by pointer here?
}
At worst, your lecturer is wrong. At best, he was simplifying terminology, and confusing you in the process. This is reasonably commonplace in software education, unfortunately. The truth is, many books get this wrong as well; the array is not "passed" at all, either "by pointer" or "by reference".
In fact, because arrays cannot be passed by value due to an old C restriction, there is some special magic that happens with arrays as function arguments.
The function declaration:
void funcA(int[]);
is silently translated into the following:
void funcA(int*);
and when you write this:
funcA(myArray);
it is silently translated into the following:
funcA(&myArray[0]);
The result is that you're not passing the array at all; you pass a pointer to its first element.
Now, at certain levels of abstraction/simplification, you can call this "passing an array by pointer", "passing an array by reference" or even "passing a handle to an array", but if you want to talk in C++ terms, none of those phrases are accurate.
The terminology used by your lecturer is confusing. However, in a function declaration such as
void funcA(int []);
the int[] is just another way of saying int*. So funcA can take any argument that is or can be converted to an int*.
Arrays can decay to pointers to the first element in the right context. This means, for example, that you can assign an array's name to a pointer like this:
int array[42]; // array is of type int[42]
int * arr = array; // array decays to int*
So, when you pass array to funcA,
funcA(array); // array decays to int*
funcA has a pointer to the first element of the array.
But it is also possible to pass arrays by reference. It just requires a different syntax. For example
void funcB(int (&arr)[42]);
So, in your example, you are passing a pointer to the first element of the array, due to the signature of your function funcA. If you called funcB(array), you would be passing a reference.
Pass-by-pointer is a bit of a misnomer. It doesn't happen in C++. There is only pass-by-value and pass-by-reference. Pointers in particular are passed by value.
The answer to your question is: it depends.
Consider the following signatures:
void foo(int *arr);
void bar(int *&arr);
void baz(int * const &arr);
void quux(int (&arr)[42]);
Assuming you are passing an array to each of these functions:
In foo(arr), your array is decayed to a pointer, which is then passed by value.
In bar(arr), this is a compiler error, because your array would decay to a (temporary) pointer, and this would be passed by reference. This is nearly always a bug, since the reason you would want a mutable reference is to change the value of the referent, and that would not be what would happen (you would change the value of the temporary instead). I add this since this actually does work on some compilers (MSVC++) with a particular extension enabled. If you instead decay the pointer manually, then you can pass that instead (e.g. int *p = arr; bar(p);)
In baz(arr), your array decays to a temporary pointer, which is passed by (const) reference.
In quux(arr), your array is passed by reference.
What your book means by them being similar is that passing a pointer by value and passing a reference are usually implemented identically. The difference is purely at the C++ level: with a reference, you do not have the value of the pointer (and hence cannot change it), and it is guaranteed to refer to an actual object (unless you broke your program earlier).
When reading about const_cast I came across sentences like the following:
Only the following conversions can be done with const_cast. In particular, only const_cast may be used to cast away (remove) constness or volatility.
1) Two possibly multilevel pointers to the same type may be converted between each other, regardless of cv-qualifiers at each level.
I've googled around a bit already and haven't found any concise, straightforward definitions of what a multilevel pointer is. So: what exactly is a multilevel pointer?
(Possible face-palm moment) It it just a pointer to a pointer, or pointer to a pointer to a pointer, e.g. int ** or int ***?
Is it just a pointer to a pointer, or pointer to a pointer to a pointer, e.g. int ** or int ***?
It is exactly this, yes.
I have some problems using two dimensional array.
static const int PATTERNS[20][4];
static void init_PATTERN()
{
// problem #1
int (&patterns)[20][4] = const_cast<int[20][4]>(PATTERNS);
...
}
extern void UsePattern(int a, const int** patterns, int patterns_size);
// problem #2
UsePattern(10, PATTERNS, sizeof(PATTERNS)/sizeof(PATTERNS[0]));
In the first statement, I need to cast the const off the two dimensional array PATTERNS. The reason for this is that the init function is called only once, and in the remaining code, PATTERNS is strictly read-only.
In the second statement, I need to pass PATTERNS array to the int** argument. Direct passing resulted a compile error.
I have solved the problem, just about the same time that #Andrey posted the answer. Yes int[][] can't be casted to int**.
It can be casted to int* through &(PATTERNS[0][0]), and the function prototype must be modified with row size (the number of elements in a row). The array can be const_cast away with reference syntax.
Firstly, there's no such thing as cast to array type (or to function type) in C++. Yet this is what you are trying to do. If you want to cast away constness from something, you have to cast to either pointer or reference type. In your case you have a reference on the receiving end of the cast, so the cast itself has to be to reference type as well
int (&patterns)[20][4] = const_cast<int (&)[20][4]>(PATTERNS);
Of course, as Bill already noted, casting away constness from a constant object (and then attempting to modify the object) leads to undefined behavior.
Secondly, a two-dimensional array cannot be passed anywhere as an int ** pointer. If you want to pass your PATTERNS somewhere, you can pass it as const int (&)[20][4], const int (*)[20][4], const int [][4], const int (*)[4] or something similar to that, but not as int **. Do a search on SO and/or read some FAQ on arrays to understand why. This has been explained too many times to repeat it again.
When you declare PATTERNS as const, the compiler may set it up in read-only memory. You can't safely cast away const unless the item was originally declared without const.
I'm guessing that your compiler error was cannot convert 'int (*)[4]' to 'int**' for argument '2' to 'void UsePattern(int, int**, int)'?
AndreyT's answer is perfect. I'd only like to add that I believe you would be better using a class that does the init_PATTERN() work in the constructor and overrides the operator[] to give readonly access to the array elements.
This, of course, assuming you can change the UsePattern function to get a reference to such class instead of a pointer to int array.
C++ Arrays are complicated. You can't just throw them around and expect them to work like in some languages. The only way to initialize an array from another array is to navigate a for loop and copy each item individually. This goes doubly for two-dimensional arrays (meaning you'll need two for loops).
It seems like you're trying to make this more complicated than it needs to be. For instance, if the set of values you will be assigning to PATTERNS will be the same every time you run the program, you can initialize a two dimensional variable like this:
static const int foo[2][3] = {{11,12,13},{21,22,23}};
If the set of values assigned to PATTERNS varies from one execution to the next, then you should probably try to find a different way to approach the problem. I would probably wrap the data in a class, especially if your intention is to use similarly-sized two-dimensional arrays elsewhere in the code.
I'm relatively new to C++ (about one year of experience, on and off). I'm curious about what led to the decision of type * name as the syntax for defining pointers. It seems to me that the syntax should be type & name as the & symbol is used everywhere else in code to refer to the variable's memory address. So, to use the traditional example of int pointers:
int a = 1;
int * b = &a;
would become
int a = 1;
int & b = &a
I'm sure there's some reason for this that I'm just not seeing, and I'd love to hear some input from C++ veterans.
Thanks,
-S
C++ adopts the C syntax. As revealed in "The Development of the C Language" (by Dennis Ritchie) C uses * for pointers in type declarations because it was decided that type syntax should follow use.
For each object of [a compound type], there was already a way to mention the underlying object: index the array, call the function, use the indirection operator [*] on the pointer. Analogical reasoning led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation that i, *pi, and **ppi all yield an int type when used in an expression.
Here's a more complex example:
int *(*foo)[4][];
This declaration means an expression *(*foo)[4][0] has type int, and from that (and that [] has higher precedence than unary *) you can decode the type: foo is a pointer to an array of size 4 of array of pointers to ints.
This syntax was adopted in C++ for compatibility with C. Also, don't forget that C++ has a use for & in declarations.
int & b = a;
The above line means a reference variable refering to another variable of type int. The difference between a reference and pointer roughly is that references are initialized only, and you can not change where they point, and finally they are always dereferenced automatically.
int x = 5, y = 10;
int& r = x;
int sum = r + y; // you do not need to say '*r' automatically dereferenced.
r = y; // WRONG, 'r' can only have one thing pointing at during its life, only at its infancy ;)
I think that Dennis Ritchie answered this in The Development of the C Language:
For each object of such a composed
type, there was already a way to
mention the underlying object: index
the array, call the function, use the
indirection operator on the pointer.
Analogical reasoning led to a
declaration syntax for names mirroring
that of the expression syntax in which
the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an
integer, a pointer to a pointer to an
integer. The syntax of these
declarations reflects the observation
that i, *pi, and **ppi all yield an
int type when used in an expression.
Similarly,
int f(), *f(), (*f)();
declare a function returning an
integer, a function returning a
pointer to an integer, a pointer to a
function returning an integer;
int *api[10], (*pai)[10];
declare an array of pointers to
integers, and a pointer to an array of
integers. In all these cases the
declaration of a variable resembles
its usage in an expression whose type
is the one named at the head of the
declaration.
So we use type * var to declare a pointer because this allows the declaration to mirror the usage (dereferencing) of the pointer.
In this article, Ritchie also recounts that in "NB", an extended version of the "B" programming language, he used int pointer[] to declare a pointer to an int, as opposed to int array[10] to declare an array of ints.
If you are a visual thinker, it may help to imagine the asterisk as a black hole leading to the data value. Hence, it is a pointer.
The ampersand is the opposite end of the hole, think of it as an unraveled asterisk or a spaceship wobbling about in an erratic course as the pilot gets over the transition coming out of the black hole.
I remember being very confused by C++ overloading the meaning of the ampersand, to give us references. In their desperate attempt to avoid using any more characters, which was justified by the international audience using C and known issues with keyboard limitations, they added a major source of confusion.
One thing that may help in C++ is to think of references as pre-prepared dereferenced pointers. Rather than using &someVariable when you pass in an argument, you've already used the trailing ampersand when you defined someVariable. Then again, that might just confuse you further!
One of my pet hates, which I was unhappy to see promulgated in Apple's Objective-C samples, is the layout style int *someIntPointer instead of int* someIntPointer
IMHO, keeping the asterisk with the variable is an old-fashioned C approach emphasizing the mechanics of how you define the variable, over its data type.
The data type of someIntPointer is literally a pointer to an integer and the declaration should reflect that. This does lead to the requirement that you declare one variable per line, to avoid subtle bugs such as:
int* a, b; // b is a straight int, was that our intention?
int *a, *b; // old-style C declaring two pointers
int* a;
int* b; // b is another pointer to an int
Whilst people argue that the ability to declare mixed pointers and values on the same line, intentionally, is a powerful feature, I've seen it lead to subtle bugs and confusion.
Your second example is not valid C code, only C++ code. The difference is that one is a pointer, whereas the other is a reference.
On the right-hand side the '&' always means address-of. In a definition it indicates that the variable is a reference.
On the right-hand side the '*' always means value-at-address. In a definition it indicates that the variable is a pointer.
References and pointers are similar, but not the same. This article addresses the differences.
Instead of reading int* b as "b is a pointer to int", read it as int *b: "*b is an int". Then, you have & as an anti-*: *b is an int. The address of *b is &*b, or just b.
I think the answer may well be "because that's the way K&R did it."
K&R are the ones who decided what the C syntax for declaring pointers was.
It's not int & x; instead of int * x; because that's the way the language was defined by the guys who made it up -- K&R.