printf and pointers [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Correct format specifier to print pointer (address)?
When printing a pointer using printf, is it necessary to cast the pointer to void *? In other words, in code like
#include <stdio.h>
int main() {
int a;
printf("address of a = %p\n", &a);
}
should the argument really be (void *) &a? gcc doesn't seem to give any warnings when no explicit cast is made.

Yes, the cast to void* is required.
int a;
printf("address of a = %p\n", &a);
&a is of type int*; printf's "%p" format requires an argument of type void*. The int* argument is not implicitly converted to void*, because the declaration of printf doesn't provide type information for parameters other than the first (the format string). All arguments after the format string have the default argument promotions applied to them; these promotions do not convert int* to void*.
The likely result is that printf sees an argument that's really of type int* and interprets it as if it were of type void*. This is type-punning, not conversion, and it has undefined behavior. It will likely happen to work if int* and void* happen to have the same representation, but the language standard does not guarantee that, even by implication. And the type-punning I described is only one possible behavior; the standard says literally nothing about what can happen.
(If you do the same thing with a non-variadic function with a visible prototype, so the compiler knows at the point of the call that the parameter is of type void*, then it will generate code to do an implicit int*-to-void* conversion. That's not the case here.)

Is this a C or a C++ question? For C++, it seems that according to 5.2.2 [expr.call] paragraph 7 there isn't any implicit conversion to void*. It seems that C99's 6.5.2.2 paragraph 6 also doesn't imply any explicit promotion of pointer types. This would mean that an explicit cast to void* is required as pointer types can have different size (at least in C++): if the layout of the different pointer types isn't identical you'd end up with undefined behavior. Can someone point out where it is guaranteed that a pointer is passed with the appropriate size when using variable argument lists?
Of course, being a C++ programmer this isn't much of a problem: just don't use functions with variable number of arguments. That's not a viable approach in C, though.

I think it might be necessary to cast. Are we certain that the size of pointers is always the same? I'm sure I read on stackoverflow recently that the size (or maybe just the alignment?) of a struct* can be different to that of a union*. This would suggest that one or both can be different from the size of a void*.
So even if the value doesn't change much, or at all, in the conversion, maybe the cast is needed to ensure the size of the pointer itself is correct.
In print, %p expects a void* so you should explicitly cast it. If you don't do so, and if you are lucky then the pointer size and pointer representation might save the day. But you should explicitly cast it to be certain - anything else is technically undefined behaviour.

Related

Why can't I static_cast a void* to a pointer-to-function?

This can be compiled (despite being UB (right?) because fvp == nullptr)
int f;
void* fvp{};
decltype(f)* fp = static_cast<decltype(f)*>(fvp);
but this cannot
void f() {}
void* fvp{};
decltype(f)* fp = static_cast<decltype(f)*>(fvp);
because (Clang says)
Static_cast from 'void *' to 'decltype(f) ' (aka 'void ()()') is not allowed [bad_cxx_cast_generic]
Why is this the case? And where from the standard do I understand this? I guess [expr.static.cast] is where I should look at; specifically 7, but I'm not sure.
Follow up question on the reason why I wanted to ask this one.
This is not allowed because [expr.static_cast] does not allow it. Indeed, the relevant language is:
No other conversion shall be performed explicitly using a static_­cast.
Since there is no conversion listed that would permit conversion from void* to a function pointer type, that conversion is not allowed.
The rationale is that function and data pointers may have different sizes or alignment requirements.
You might consider a reinterpret_cast, which conditionally supports such conversions. But it would be better to use a design that does not require such conversions. If you need to type-erase a function pointer, you will still need reinterpret_cast but can use e.g. void(*)() as the erased type, which is always supported.
where from the standard do I understand this? I guess [expr.static.cast] is where I should look at; specifically 7, but I'm not sure
Basically, there is no guarantee for a full-round trip between void* and a function pointer.
This can be seen from expr.reinterpret.cast#8 which says:
Converting a function pointer to an object pointer type or vice versa is conditionally-supported. The meaning of such a conversion is implementation-defined, except that if an implementation supports conversions in both directions, converting a prvalue of one type to the other type and back, possibly with different cv-qualification, shall yield the original pointer value.

Advantage of static_cast<void*>() to see the memory address? [duplicate]

In a recent question, someone mentioned that when printing a pointer value with printf, the caller must cast the pointer to void *, like so:
int *my_ptr = ....
printf("My pointer is: %p", (void *)my_ptr);
For the life of me I can't figure out why. I found this question, which is almost the same. The answer to question is correct - it explains that ints and pointers are not necessarily the same length.
This is, of course, true, but when I already have a pointer, like in the case above, why should I cast from int * to void *? When is an int * different from a void *? In fact, when does (void *)my_ptr generate any machine code that's different from simply my_ptr?
UPDATE:
Multiple knowledgeable responders quoted the standard, saying passing the wrong type may result in undefined behavior. How? I expect printf("%p", (int *)ptr) and printf("%p", (void *)ptr) to generate the exact same stack-frame. When will the two calls generate different stack frames?
The %p conversion specifier requires an argument of type void *. If you don't pass an argument of type void *, the function call invokes undefined behavior.
From the C Standard (C11, 7.21.6.1p8 Formatted input/output functions):
"p - The argument shall be a pointer to void."
Pointer types in C are not required to have the same size or the same representation.
An example of an implementation with different pointer types representation is Cray PVP where the representation of pointer types is 64-bit for void * and char * but 32-bit for the other pointer types.
See "Cray C/C++ Reference Manual", Table 3. in "9.1.2.2" http://docs.cray.com/books/004-2179-003/004-2179-003-manual.pdf
In C language all pointer types potentially differ in their representations. So, yes, int * is different from void *. A real-life platform that would illustrate this difference might be difficult (or impossible) to find, but at the conceptual level the difference is still there.
In other words, in general case different pointer types have different representations. int * is different from void * and different from double *. The fact that your platform uses the same representation for void * and int * is nothing more than a coincidence, as far as C language is concerned.
The language states that some pointer types are required to have identical representations, which includes void * vs. char *, pointers to different struct types or, say, int * and const int *. But these are just exceptions from the general rule.
Other people have adequately addressed the case of passing an int * to a prototyped function with a fixed number of arguments that expects a different pointer type.
printf is not such a function. It is a variadic function, so the default argument promotions are used for its anonymous arguments (i.e. everything after the format string) and if the promoted type of each argument does not exactly match the type expected by the format effector, the behavior is undefined. In particular, even if int * and void * have identical representation,
int a;
printf("%p\n", &a);
has undefined behavior.
This is because the layout of the call frame may depend on the exact concrete type of each argument. ABIs that specify different argument areas for pointer and non-pointer types have occurred in real life (e.g. the Motorola 68000 would like you to keep pointers in the address registers and non-pointers in the data registers to the maximum extent possible). I'm not aware of any real-world ABI that segregates distinct pointer types, but it's allowed and it would not surprise me to hear of one.
c11: 7.21.6 Formatted input/output functions (p8):
p The argument shall be a pointer to void. The value of the pointer is
converted to a sequence of printing characters, in an implementation-defined
manner.
In reality except on ancient mainframes/minis, different pointer types are extremely unlikely to have different sizes. However they have different types, and per the specification for printf, calling it with the wrong type argument for the format specifier results in undefined behavior. This means don't do it.
when printing a pointer value with printf, the caller must cast the pointer to void *
Even casting to void * is not sufficient for all pointers.
C has 2 kind of pointer: Pointers to objects and pointers to functions.
Any object pointer can convert to void* with no problem:
printf("My pointer is: %p", (void *)my_ptr); // OK when my_ptr points to an object
A conversion of a pointer to a function to void * is not defined.
Consider a system in 2021 where void * is 64-bit and a function pointer is 128 bit.
C does specify (my emphasis)
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type. C17dr § 6.3.2.3 6
To print a function pointer could attempt:
printf("My function pointer is: %ju", (uintmax_t) my_function_ptr); // Selectively OK
C lacks a truly universal pointer and lacks a clean way to print function pointers.
Addressing the question:
When will the two calls generate different stack frames?
The compiler may notice that the behaviour is undefined, and issue an exception, illegal instruction, etc. There's no requirement for the compiler to attempt to generate a stack frame, function call or whatever.
See here for an example of the compiler doing this in another case of UB . Instead of generating a deference instruction with null argument, it generates ud2 illegal instruction.
When the behaviour is undefined according to the language standard, there are no requirements on the compiler's behaviour.

Difference between Function pointer casting in c and c++

The behavior of my code is different in c and c++.
void *(*funcPtr)() = dlsym(some symbol..) ; // (1) works in c but not c++
int (*funcPtr)();
*(void**)(&funcPtr) = dlsym(some symbol..) ; // (2) works in c++
I dont understand how the 2nd casting works in c++ while the 1st casting doesnt work in c++. The error message for (1) show is invalid conversion from void* to void*() in c++.
The problem is that dlsym returns a void* pointer. While in C, any such pointer is implicitly convertible into any other (object!) pointer type (for comparison: casting the result of malloc), this is not the case in C++ (here you need to cast the result of malloc).
For function pointers, though, this cast is not implicit even in C. Apparently, as your code compiles, your compiler added this implicit cast for function pointers, too, as a compiler extension (in consistency with object pointers); however, for being fully standard compliant, it actually should issue a diagnostic, e. g. a compiler warning, mandated by C17 6.5.4.3:
Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1, shall be specified by means of an explicit cast.
But now instead of casting the target pointer, you rather should cast the result of dlsym into the appropriate function pointer:
int (*funcPtr)() = reinterpret_cast<int(*)()>(dlsym(some symbol..));
or simpler:
auto funcPtr = reinterpret_cast<int(*)()>(dlsym(some symbol..));
or even:
int (*funcPtr)() = reinterpret_cast<decltype(funcPtr)>(dlsym(some symbol..));
(Last one especially interesting if funcPtr has been declared previously.)
Additionally, you should prefer C++ over C style casts, see my variants above. You get more precise control over what type of cast actually occurs.
Side note: Have you noticed that the two function pointers declared in your question differ in return type (void* vs. int)? Additionally, a function not accepting any arguments in C needs to be declared as void f(void); accordingly, function pointers: void*(*f)(void). C++ allows usage of void for compatibility, while skipping void in C has totally different meaning: The function could accept anything, you need to know from elsewhere (documentation) how many arguments of which type actually can be passed to.
Strictly speaking, none of this code is valid in either language.
void *(*funcPtr)() = dlsym(some symbol..) ;
dlsym returns type void* so this is not valid in either language. The C standard only allows for implicit conversions between void* and pointers to object type, see C17 6.3.2.3:
A pointer to void may be converted to or from a pointer to any object type. A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
Specifically, your code is a language violation of one of the rules of simple assignment, C17 6.5.16.1, emphasis mine:
the left operand has atomic, qualified, or unqualified pointer type, and (considering
the type the left operand would have after lvalue conversion) one operand is a pointer
to an object type, and the other is a pointer to a qualified or unqualified version of
void, and the type pointed to by the left has all the qualifiers of the type pointed to
by the right;
The reason why it might compile on certain compilers is overly lax compiler settings. If you are for example using gcc, you need to compile with gcc -std=c17 -pedantic-errors to get a language compliant compiler, otherwise it defaults to the non-standard "gnu11" language.
int (*funcPtr)();
*(void**)(&funcPtr) = dlsym(some symbol..) ; // (2) works in c++
Here you explicitly force the function pointer to become type void**. The cast is fine syntax-wise, but this may or may not be a valid pointer conversion on the specific compiler. Again, conversions between object pointers and function pointers are not supported by the C or C++ standard, but you are relying on non-standard extensions. Formally this code invokes undefined behavior in both C and C++.
In practice, lots of compilers have well-defined behavior here, because most systems have the same representation of object pointers and function pointers.
Given that the conversion is valid, you can of course de-reference a void** to get a void* and then assign another void* to it.

Which cast to use; static_cast or reinterpret_cast?

int i = 1000;
void *p = &i;
int *x = static_cast<int*>(p);
int *y = reinterpret_cast<int*>(p);
which cast should be used to convert from void* to int* and why?
static_cast provided that you know (by design of your program) that the thing pointed to really is an int.
static_cast is designed to reverse any implicit conversion. You converted to void* implicitly, therefore you can (and should) convert back with static_cast if you know that you really are just reversing an earlier conversion.
With that assumption, nothing is being reinterpreted - void is an incomplete type, meaning that it has no values, so at no point are you interpreting either a stored int value "as void" or a stored "void value" as int. void* is just an ugly way of saying, "I don't know the type, but I'm going to pass the pointer on to someone else who does".
reinterpret_cast if you've omitted details that mean you might actually be reading memory using a type other than the type is was written with, and be aware that your code will have limited portability.
By the way, there are not very many good reasons for using a void* pointer in this way in C++. C-style callback interfaces can often be replaced with either a template function (for anything that resembles the standard function qsort) or a virtual interface (for anything that resembles a registered listener). If your C++ code is using some C API then of course you don't have much choice.
In current C++, you can't use reinterpret_cast like in that code. For a conversion of void* to int* you can only use static_cast (or the equivalent C-style cast).
For a conversion between different function type pointers or between different object type pointers you need to use reinterpret_cast.
In C++0x, reinterpret_cast<int*>(p) will be equivalent to static_cast<int*>(p). It's probably incorporated in one of the next WPs.
It's a misconception that reinterpret_cast<T*>(p) would interpret the bits of p as if they were representing a T*. In that case it will read the value of p using p's type, and that value is then converted to a T*. An actual type-pun that directly reads the bits of p using the representation of type T* only happens when you cast to a reference type, as in reinterpret_cast<T*&>(p).
As far as I know, all current compilers allow to reinterpret_cast from void* and behave equivalent to the corresponding static_cast, even though it is not allowed in current C++03. The amount of code broken when it's rejected will be no fun, so there is no motivation for them to forbid it.
When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used? gives some good details.
From the semantics of your problem, I'd go with reinterpret, because that's what you actually do.

Semantics of char a[]

I recently embarrassed myself while explaining to a colleague why
char a[100];
scanf("%s", &a); // notice a & in front of 'a'
is very bad and that the slightly better way to do it is:
char a[100];
scanf("%s", a); // notice no & in front of 'a'
Ok. For everybody getting ready to tell me why scanf should not be used anyway for security reasons: ease up. This question is actually about the meaning of "&a" vs "a".
The thing is, after I explained why it shouldn't work, we tried it (with gcc) and it works =)). I ran a quick
printf("%p %p", a, &a);
and it prints the same address twice.
Can anybody explain to me what's going on?
Well, the &a case should be obvious. You take the address of the array, exactly as expected.
a is a bit more subtle, but the answer is that a is the array. And as any C programmer knows, arrays have a tendency to degenerate into a pointer at the slightest provocation, for example when passing it as a function parameter.
So scanf("%s", a) expects a pointer, not an array, so the array degenerates into a pointer to the first element of the array.
Of course scanf("%s", &a) works too, because that's explicitly the address of the array.
Edit: Oops, looks like I totally failed to consider what argument types scanf actually expects. Both cases yield a pointer to the same address, but of different types. (pointer to char, versus pointer to array of chars).
And I'll gladly admit I don't know enough about the semantics for ellipsis (...), which I've always avoided like the plague, so looks like the conversion to whichever type scanf ends up using may be undefined behavior. Read the comments, and litb's answer. You can usually trust him to get this stuff right. ;)
Well, scanf expects a char* pointer as the next argument when seeing a "%s". But what you give it is a pointer to a char[100]. You give it a char(*)[100]. It's not guaranteed to work at all, because the compiler may use a different representation for array pointers of course. If you turn on warnings for gcc, you will see also the proper warning displayed.
When you provide an argument object that is an argument not having a listed parameter in the function (so, as in the case for scanf when has the vararg style "..." arguments after the format string), the array will degenerate to a pointer to its first element. That is, the compiler will create a char* and pass that to printf.
So, never do it with &a and pass it to scanf using "%s". Good compilers, as comeau, will warn you correctly:
warning: argument is incompatible with corresponding format string conversion
Of course, the &a and (char*)a have the same address stored. But that does not mean you can use &a and (char*)a interchangeably.
Some Standard quotes to especially show how pointer arguments are not converted to void* auto-magically, and how the whole thing is undefined behavior.
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object. (6.3.2.1/3)
So, that is done always - it isn't mentioned below explicitly anymore when listening valid cases when types may differ.
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments. (6.5.2.2/7)
About how va_arg behaves extracting the arguments passed to printf, which is a vararg function, emphasis added by me (7.15.1.1/2):
Each invocation of the va_arg macro modifies ap so that the
values of successive arguments are returned in turn. The parameter type shall be a type
name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
one type is a signed integer type, the other type is the corresponding unsigned integer
type, and the value is representable in both types;
one type is pointer to void and the other is a pointer to a character type.
Well, here is what that default argument promotion is:
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions. (6.5.2.2/6)
It's been a while since I programmed in C but here's my 2c:
char a[100] doesn't allocate a separate variable for the address of the array, so the memory allocation looks like this:
---+-----+---
...|0..99|...
---+-----+---
^
a == &a
For comparison, if the array was malloc'd then there is a separate variable for the pointer, and a != &a.
char *a;
a = malloc(100);
In this case the memory looks like this:
---+---+---+-----+---
...| a |...|0..99|...
---+---+---+-----+---
^ ^
&a != a
K&R 2nd Ed. p.99 describes it fairly well:
The correspondence between indexing
and pointer arithmetic is very close.
By definition, the value of a variable
or expression of type array is the
address of element zero of the array.
Thus after the assignment pa=&a[0];
pa and a have identical values. Since
the name of the array is a synonym for
the location of the initial element,
the assignment pa=&a[0] can also be
written as pa=a;
A C array can be implicitly converted to a pointer to its first element (C99:TC3 6.3.2.1 §3), ie there are a lot of cases where a (which has type char [100]) will behave the same way as &a[0] (which has type char *). This explains why passing a as argument will work.
But don't start thinking this will always be the case: There are important differences between arrays and pointers, eg regarding assignment, sizeof and whatever else I can't think of right now...
&a is actually one of these pitfalls: This will create a pointer to the array, ie it has type char (*) [100] (and not char **). This means &a and &a[0] will point to the same memory location, but will have different types.
As far as I know, there is no implicit conversion between these types and they are not guaranteed to have a compatible representation as well. All I could find is C99:TC3 6.2.5 §27, which doesn't says much about pointers to arrays:
[...] Pointers to other types need not have the same representation or alignment requirements.
But there's also 6.3.2.3 §7:
[...] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
So the cast (char *)&a should work as expected. Actually, I'm assuming here that the lowest addressed byte of an array will be the lowest addressed byte of its first element - not sure if this is guaranteed, or if a compiler is free to add arbitrary padding in front of an array, but if so, that would be seriously weird...
Anyway for this to work, &a still has to be cast to char * (or void * - the standard guarantees that these types have compatible representations). The problem is that there won't be any conversions applied to variable arguments aside from the default argument promotion, ie you have to do the cast explicitly yourself.
To summarize:
&a is of type char (*) [100], which might have a different bit-representation than char *. Therefore, an explicit cast must be done by the programmer, because for variable arguments, the compiler can't know to what it should convert the value. This means only the default argument promotion will be done, which, as litb pointed out, does not include a conversion to void *. It follows that:
scanf("%s", a); - good
scanf("%s", &a); - bad
scanf("%s", (char *)&a); - should be ok
Sorry, a tiny bit off topic:
This reminded me of an article I read about 8 years ago when I was coding C full time. I can't find the article but I think it was titled "arrays are not pointers" or something like that. Anyway, I did come across this C arrays and pointers FAQ which is interesting reading.
char [100] is a complex type of 100 adjacent char's, whose sizeof equals to 100.
Being casted to a pointer ((void*) a), this variable yields the address of the first char.
Reference to the variable of this type (&a) yields address of the whole variable, which, in turn, also happens to be the address of the first char